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1. Preface. ‘his is an expository paper giving an account of the “goodness of 
fit’’ test and the “‘two sample” test based on the empirical distribution function 
tests which were initiated by the four authors cited in the title. An attempt is 
made here to give a fairly complete coverage of the history, development, present 
status, and outstanding current problems related to these topics. 

The reader is advised that the relative amount of space and emphasis allotted 
to the various phases of the subject does not reflect necessarily their intrinsic 
merit and importance, but rather the author’s personal interest and familiarity. 
Also, for the sake of uniformity the notation of many of the writers quoted has 
been altered so that when referring to the original papers it will be necessary to 
check their nomenclature. 


2. The empirical distribution function and the tests. Let XY, , X2, , X, be 
independent random variables (observations) each having the same distribution 
function U(x) = Pr{X,; < x} and put 

] 
(2.1) e(x) 


Then the (random) function 


(2.2) 


is called the empirical distribution function of the data. Clearly F(x) is the pro- 
portion of the X;,7 = 1, 2, --- ,, which are less than 2. 
It is easy to calculate the first and second order moments 


E(F,(x)) = U(a), 
Cov(F,(x), F.n(y)) = E(F,(x)F.(y)) — U(x) Uy) 


1 ’ ; 
c(U(a), U(y)), 
n 


where 


s(1 — 2) 
c(s, t) min(s, ¢) 
‘ ») 
(2.0) 


O<s 


We quote a few classical consequences of the definition (2.2): 


Received April 10, 1957. Special Invited Paper read before the Institute of Mathemati- 
cal Statistics, August 21, 1956. 

1 Research carried out at the Statistical Research Center, University of Chicago, under 
sponsorship of the Statistics Branch Office of Naval Research. Reproduction in whole or in 
part is permitted for any purpose of the United States Government. 


823 





D. A. DARLING 


Strong law of large numbers, 
(2.4) F(x) — U(2«) with probability 1 for each z. 
Law of the iterated logarithm, 


| F,, —U 
lim J/n- oe @ 


nw~s 


= VU(a)(1 — U()) 


2 log log n log n 


with probability 1 for each zx. 
Multidimensional central limit theorem, 


(2.5) {-Vnl(F, (2) — U(x,))} i=1,2,---,k 


has an asymptotic (n — «, k fixed) k-dimensional normal distribution, means 
0 and covariance c(U(x;), U(x;)) with e(s, f) given by (2.3). 

Cantelli-Glivenko lemma ({29], [88}), 
(2.6) sup |F,(z) — U(x)|-0 with probability 1. 

wcrc 

The last result (2.6), which considerably generalizes (2.4), is itself capable of 
further extensions. Fortet and Mourier [22] have shown 1/n Dim f( X;)- 
E(f(X;)) uniformly with respect to an inclusive family of functions {f} with 
probability 1. Then (2.6) follows on considering the family f;(z) = «( — 2), 
—« <£ < o, with e(x) given by (2.1). Steinhaus [79] showed that the mutual 
independence of the X; could be relaxed to pairwise independence and (2.6) 
holds. See also Wolfowitz [89]. 

The following two statistical problems motivate the analysis: 

(a) Goodness-of-fit problem. Let the X; be the random variables described in 
the first sentence of this section. The goodness-of-fit problem is to devise a test 
of the hypothesis 


(2.7) He: U(z) = F(z), 


where F(z) is a given continuous distribution function. This is one of the classical 
problems of statistics for which K. Pearson developed the well known x test— 
cf. Cochran [15]. 

(b) Two-sample problem. Let the X,; be as above with U(x) known to be con- 
tinuous and let Y,;, Y2, --- , Ym be independent random variables with the 
common continuous distribution V(x) = Pr{Y; < x}, all n + m of these ran- 
dom variables being mutually independent. The two-sample problem is to devise 
a test of the hypothesis 


(2.8) Ho: U(z) = V(z). 


This is also an old, celebrated problem—cf. [53]. 

Roughly speaking, the tests proposed here of the null hypotheses Hy , Ho are 
based on certain distribution analogues of the Cantelli-Glivenko lemma (2.6) in 
the same way that the central limit theorem is a distribution analogue of the 
law of large numbers. 
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3. The Cramér-Smirnov tests. In 1928 Cramér |13] suggested for Hy the 
following test criterion: 
| (F(a) — F(a)’ aK (2), 
Jo 
where K(x) is suitable nondecreasing weight function. H» given by (2.7) is to be 
rejected if this expression is too large. Von Mises [83] independently made an 
equivalent suggestion and developed a few properties of the test. 

Smirnov [71], [72] gave the modification 


(3.1) Wi=n £ (F,(2) — F(x))*W(F(a)) dF (2). 


where ¥(/), 0 < ¢ S 1, is a nonnegative weight function to be selected presum- 
ably on the grounds of certain power requirements. The test based on W%, is 
distribution free—this is readily seen from (2.2) for, if (2.7) is true, we have, re- 
calling the continuity of F(x), 


n bi (2 > e(x — X;) - F(2)) V(F(x)) dF (x) 


j=l 


n 


n f (- > et — F(X)) - ‘) v(t) dt, 


j=l 
with probability 1, and since the F(X,;) are independent and uniformly dis- 
tributed over (0, 1) the result follows. 

Besides being distribution free the test is consistent (if y > 0) and requires no 
arbitrary grouping of the data—these three desirable properties are not shared 
by the x test of Ho. 

Smirnov’s basic result concerning the distribution of (3.1) if (2.7) is true is 
that 
(3.3) lim Efe®"*} = (D(2it))~, 


where D(X) is the Fredholm determinant associated with the kernel 


(3.4) k(s, t) = Y(sW(dec(s, 0), Oss, :. 


c(s, t) being given by (2.3). 
Smirnov found the distribution function corresponding to this limiting char- 
acteristic function in the following form [72]: 
lim Pr {Wi < xz} = G(a) 


n> 


—ryy/9? 
e7zul2 


: a ok 
=l]-- (—1)*" —====, 
T & —D) Joos v -yDly) 
where \;,7 = 1, 2, --- are the (simple) zeros of D(A). Later Smirnov [77] gave 
simpler proofs of these results. 


2 The factor (—1)*' is missing throughout [72]. 
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von Mises [84] deduced (3.3) and considered a number of extensive general- 
izations in the direction of nonidentically distributed X; , and quadratic forms 
other than the mean square. 

There now exist quite simple proofs of (3.3) resting on a reduction to a simple 
stochastic process, basically an idea of Doob [19] and Kae [42]. If we let F%(¢) be 
the empirical distribution function based on F(X), F(X»), --- , F(X,), then 
from (3.2) we deduce that if 


/ * 
tt Vn(Fi(t) — 2), 
then 
Wi = | xn(Ov(d) dt. 
0 
From the fact that (2.5) has a limiting multidimensional normal distribution, 
“,(t) converges in distribution to a Gaussian process x(/) with mean 0 and co- 
variance c(s, ¢) given by (2.3). If now Q(f) is a “reasonable” functional to the 
reals it is natural to conjecture that 


(3.6) lim Pr {Q(2,(0) < x} Pr {Q(a(t)) < x}. 


This being true for Q(f) fof (OW dt, Smirnov’s result (3.3) follows imme- 
diately from a theorem of Kac and Siegert [41]. 

Kac [43] justified (3.6) for this Q when y = 1, and Donsker [18] proved (3.6 
for a wide class of Q. There now exist very extensive generalizations of this so- 
called invariance principle ((66], [57]). 

The essential result of the line of attack in [41] is that for z(t), a <= b, 
Gaussian 

E(z(t)) = 0 
E(z(t)z(s)) I'(s, ¢), 


the distribution of W- fe 2"(t) dt is that of 


$e 


jal A 


where G, , G2, --- are independent, normally distributed, means 0, variances |, 
and \y, Ax, -:: are the eigenvalues of the kernel I'(s, ?)—i.e., the zeros of the 
Fredholm determinant D(\) of the integral equation 


fd r | P(t, s)f(s) ds. 


a 


lor the kernel (3.4), this result yields (3.3) immediately. 

A systematic study of the limiting distribution of (3.1) was made in [1], and it 
turns out that D(A) can be determined from an initial value equation. If (é) is 
continuous in 0 S ¢ S 1, then 


g(t) + AW(Ae(t) = 0, ¢(0) = 0, ¢’ (0) = 1, 
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has a unique solution g(t) and the Fredholm determinant D(\) of (3.4) is 


¢n(1) 


DQ) = 
go(1 ) 


lor the important case y = 1, the limiting characteristic function (3.3) is 
(\/2E ese ~/2i£)*. This was inverted in [1] in a form different from (3.5) and a 
table given of the limiting distribution of W°, . For the statistically appealing 
weight function 


- Wo = >’ 
the limiting characteristic function is ~/ —2zit|cos(44(1 + 8i€)’)|? which was 
also inverted [1] and a few significance points given [2]. 

There is no multivariate analogue to the W%, test which is distribution free 
unless the components are independent). There is, however, a transformation 
of a multivariate distribution to a uniform distribution over the unit cube due to 
Lévy, and Rosenblatt [68] suggested an analogue to W%, for it and obtained [69] 
a few results for the corresponding limiting distribution. 

For Ho of (2.8) a corresponding distribution free test exists—cf. Lehmann [53] 
The natural analogue to (3.1) is 


mi [ (F,(2) — G.(z))*y pa + mas) d (ats + =) 
n 


m+ dings m+n m+n 


where F(z) and G,,(x) are respectively the empirical distribution functions of 
the X’s and the Y’s. It is easy to prove when (4.4) below holds that this has the 
same limiting distribution (if Ho is true) as W* of (3.1)—cf. [69] for the case 


v I 


4. The Kolmogorov-Smirnov tests. In 1933 Kolmogorov |45] suggested a test 
of Hy of (2.7) based on the statistic 


11 K, = Vn sup | F,(x) — F(z) 

Hy is to be rejected if A,, is sufficiently large. The distribution of A, is inde 
pendent of F(x) if (2.7) is true (i.e., the test is distribution free) and denoting 
its distribution by ®,(2) Kolmogorov proved that 


lim Pr {K, < 2} lim ®,(2) P(x 


n 


—1)%e 
x 
If F(x) is not continuous, Pr{ AK, < x} 2 ®,(2), so the test could be used con 
servatively even if the X; have not a continuous distribution. Smirnov [74] gave 
a simpler proof of (4.2) and also a distribution free test of H; . He proved that 
the random variable 
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(4.3) Sess sup | F,(x) — G,(x) 


Mm + N-wezrcn 
with distribution function ®,, , had, if 
m 
(4.4) 0<asg -3b < o, 
i 
a limiting distribution ® given by (4.2). 
For the corresponding one-sided tests define 


(4.5) Ki, = <n sup (F,(x) — F(z)), 


—BL TKO 


+ / mn , , 
(4.6 Do. = {/ — s (F,(a) — G,(x)), 
9) st up (F,(z) t)) 


(4.7 Dan = A/ —~ sup (G,.(z) — F,(2)). 
7) Vota © ) )) 
Smirnov ([74], [75]) gave limiting distributions of these random variables un- 
der condition (4.4) 
lim Pr{K{ < x} = lim Pr{Dz, < 2} 
(4.8) 


lim Pr {Din < 2, Dan < y} = (2, y) 


(4.9) =J]+ y {De Plzty)? eet! . 
1 


Os zy 


The early work of Kolmogorov and Smirnov is summarized in [46] and [75]. A 
short table of the distribution ® of (4.2) was given in [74] and amplified in [76]. 
Corrections to the tables are in [50], [51], and extensive percentage points in [65}. 

Wald and Wolfowitz ({85], [86]), in connection with a problem of finding con- 
fidence limits for an unknown distribution function considered independently the 
distribution of K, of (4.1), giving methods of calculating its distribution for 
finite n. For elementary expository remarks and applications, cf. [39]. 

Feller [21] rederived (4.2). A strong counterpart of (4.2) was given by Chung 
|12] who proved that infinitely many inequalities 


sup W”|F,(2) — F(x)| > rn, 


he ee ee) 


occur with probability zero or one according as 
Date 
— ¢€ 
n 


converges or diverges. 
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Doob [19] showed that # of (4.2) is given by 
(4.10) $(x) = Pr { sup | z(t)| < z}, 


0<t<1 
where x(/) is the Gaussian process of (3.6). Doob omitted the justification of 
(3.6) for Q(f) = sup f|, which was supplied by Donsker [18]. Doob observed 
that the Gaussian process z(/) with mean 0 and covariance (2.3) was simply 
transformable to the Wiener process w(t), 0 < t < «, and that the probability 
(4.10) is a simple first passage probability for that process. Similarly for the 
limiting distributions of (4.3), (4.5), and (4.6). 

Using this last observation a generalization of the K,, test was proposed [1] as 
follows: 
(4.11) Kt = sup Vn'/F,(x) — F(x) | W(F(a)), 

~ecr<@ 


f 


where y = 0 is a preassigned weight function. The limiting distribution of K* 
can be obtained then as the solution to a boundary value problem associated 
with the simple diffusion equation. If ¥(?) = (at + 8)~ in a piecewise way the 
classical methods give the limiting distribution in quadratures; [1]. 

These latter include the case of detecting discrepancies over a central portion 

of the interval ({1], [55], [56]) where 
(1 a<t<b 
(4.12) v(t) = « 
0 otherwise, 
and over the tails 1 — y,(¢), [4], and the cases y2 = 1/1, ¥3 = 1/(1 — 2) for Zin 
a subinterval of (0, 1); see [67], [11], and [54]—cf. also [27]. 

For ~/¥ where y is given by (3.8), the distribution of K* was given in [1]; and 
when m = n— «, the limiting distribution of D,., of (4.3) has been treated 
[52| analogously with the weight function y, of (4.12). 

Interest of late has been in calculating the distribution of these random vari- 
ables for finite sample sizes (always under the assumption that MH» , Ho of (2.7) 
and (2.8) are true). In [85] a method of calculating the distribution of K,, of 
(4.1) was given, applicable when n is small. A series of recurrence relations were 
given in [45] for calculating the distribution of K, , and it was suggested much 
later [5] that these may be amenable to high-speed calculation—the program 
was subsequently carried out ({58], [7]) giving tables of the distribution of (4.1). 
For D;, similarly, ef. (80). 

Birnbaum and Tingey [6] proved that for (4.5) 


“\n—J ‘\ j-l 
Pr {Ki >evVn} =(1-e)!*"+e > (*) (1 -.-2) (< +2) ‘ 
1sisn(1—e) n n 


Gnedenko and his students have recently studied systematically (4.3), (4.5), 
(4.6), and (4.7), mainly in the case of equal sample sizes m = n. We abbreviate 
in this case D,,, = D, , Di, = Dz , Dz, = Dz . The distribution of D, , Dt and 
D,, can be reduced to first passage problems associated with simple random walks 
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|30], [32], [34], and [47]. Consider, e.g., the distribution of Di. li the pooled 
sample of size 2n, X,, X2,--- ,X,, Yi, +--+ , Ya, isarranged in increasing mag- 
nitude and we denote by z;,7 = 1, 2, ---, 2n a random variable equal to +1 or 
—1 according as the ith member of it is an X or Y respectively, then if H5 of 
2.8) is true and S; = z; + z+ --- +2;, (So = 0), 
Pr {Di < x} = Pr{ max S; < 2v/2n}. 
ls j<2n 

The set So, Si, --- , Se, = 0 form a Markov chain, and the probability in ques- 
tion is given by a simple reflection principle [3]. One obtains in fact 


( 2n ) 
+ n —xv/ 2n n 
Pr {D, < z} = 7 p+ |—sv/2n) ; OStEN 5 
| 2 
ni 

and similar simple formulas for the distributions of D, and the joint distribution 
of Dy and Dj for finite n. 

There exist many other results in this direction, too numerous to treat in de- 
tail; we mention several of the simpler in their limiting form: 


l a 
lim +/2n Pr< DZ + D, = / — |z~/2n|> 
Vv : WV 3, !* 


nw~we = 


= & > (47°2" 37°)e 2j222 
1 


; yaar Qe — 3x — 12 
lim p(D;,, D,) = peice 
aaer 3(4 — =) 


ef. [34]; 


lim Pr {F,(2) > G,(2) for all x such that a < U(x) < 8B} = ~ sin os 
ef. [37], [24], and [70]; where p is the correlation coefficient and U(x) the com- 
mon distribution of the X; and Y;. Much of the work of Gnedenko and his co- 
workers is summarized in [37] and [38]. 

The random walk method of treating D; , D, was employed independently in 
|20], and tables of the distribution of D, were constructed ([61], [62]), for finite 
n using methods unrelated to the above. 

For unequal sample sizes, the distributions of D,..», Din», p integral, can 
be again reduced to a random walk problem ({48}], [49], [10]) of a somewhat more 
complex kind, but still amenable to the reflection principle. 

The exact formulas lead to asymptotic expansions of K, , K,, D, , D, of 
which (4.2), (4.8), (4.9) are the leading terms—but Smirnov’s original analysis 
required only (4.4) to hold for Dz, , Dm, rather than equal sample sizes m = n. 
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5. Other tests. Besides the tests described in the preceding two sections, there 
are a number of others based on the behavior of the empirical distribution func- 
tion. 

Smirnov [73] discussed the number of crossings N, of F(x) and F(x). If (2.7) 
is true he proved that 


lim Pr {N, < th/n} =1-—6°", 


and gave generalizations. The distribution of the number of crossings of F(x) 
and G,,(z) is known [64] for m = n finite. 

For the case of two samples of size n, m = np respectively, p 2 1 in- 
tegral, Gnedenko and Mihalavié ({31], [36]) proved that if J is the number of 
“positive jumps” of F,,(z)—i.e., the number of X; , k = 1, 2, --- , m such that 
F,,.(X, — 0) = (k — 1)/m 2 G,(X;,)—then J has the simple distribution 

Pr {J =j) ==, J QA +! on 

From this last result it follows (letting p — ~) that if A, is the sum of the 

vertical parts of the graph of F(z) which exceed F(x)—.e., 


An = | (Pa(z) — F(a))e(Fa(2) — F(z) aP,(2), 
where ¢(z) is given by (2.1)—then A,, is uniformly distributed over (0, 1) 
Pr {A4, < 2} = 2, Ox2 


The limiting form of this theorem was found earlier by Kac [42], who 
gave a genera] method for finding the limiting distribution of 


[ V(F,(z) — F(x)) dF(2), 


for quite general functions V. Kac also considers the statistic corresponding to 
K,, of (4.8) when the sample size n is chosen at random with a Poisson distribu 
tion whose parameter goes to infinity. 

Smirnov [78] considered using F,,(x) to construct confidence limits, not for 
U(x), but for its density by using a statistic similar to K, . 

The effect of grouping the data on the tests has been discussed for the D, , 
Dit tests in [24], [25], and [28]; the K, , K% tests in [40], [23], and [33]; and the 
W* test in [87]. 


6. The parametric case. The two null hypotheses Hp , Hy of (2.7) and (2.8) 
are simple, and it is desirable to extend the tests to composite null hypotheses 
[14]. Some attention has been given to this problem lately for the kvpothesis H, 

We suppose, instead of (2.7), 


6.1) H*:U(x) = F(z, 8), 6¢0, 
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where the parameter @ ranges over a set 0. For the case when 6 consists of an 
interval of the reals, a < 6 < b, a test of H? analogous to W? of (3.1) was in- 
troduced in [16]: 


] 


(6.2) C n| (F,(z) — F(a, 6,))° dF (a, 6,), 


where 6,, is an estimator of 6. Hf is to be rejected if C%, is sufficiently large. The 
chief result here [16] is that, under suitable regularity conditions, if Var (6,) goes 
to zero sufficiently rapidly (the superefficient case), the limiting distributions of 
C%, and W%, (with ¥ = 1 in (3.1)) are the same, and if @ admits a “regular esti- 
mator” 6, , then the limiting distribution of C%, is that of fiy’(t) dt, where y(d) 
is a Gaussian process with mean 0 and covariance 


(6.3) k(s, 1) = c(s, t) — o(s)e(t), 


with c(s, t) given by (2.3) and 


¢(F(x, @)) = lim Vn Var (4,) < F(z, 8), 
n> 06 

6, being an asymptotically unbiased minimum variance estimator. The limiting 
distribution of C%, is then given by (3.5) for D(A) the Fredholm determinant of 
the kernel (6.3). 

The test criterion C%, of (6.2) is in its limiting form not generally distribution 
free—i.e., the limiting distribution of (6.2), if (6.1) is true, depends in general 
on the true unknown value of @ and the structure of the family F(x, @), unlike 
the W%, test of (2.7). In the important special cases where @ is a location, scale, 
or exponential parameter, the limiting distribution is independent of the par- 
ticular value of 6 obtaining, which makes the test usable. 

We quote one result: Let 


. ] ] ” 
F(x,6) == + -tan™ (x — 8), —-o <4 < w, ocg< aw; 
“ tT 
i.e., we want to test if a sample of data came from some Cauchy distribution with 
unspecified median. Then [16] 


. . 2 
D(\) = = o (= :) (1 — cos ~V)), 
and the limiting distribution of (6.2) is that of (3.7), where the \; are the zeros 
of this D(\). 

In [44] the case where F(z, 6) is a normal family with unknown mean and vari- 
ance is treated in some detail, similar to the above analysis, and important power 
comparisons with the classical x* test were made (cf. Sec. 7). In [26] the problem 
was treated from a different viewpoint, with grouped data using an analogue of 
the K, test of (4.1) and under a condition that the normalized estimator con- 
verged to a fixed value. 
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There seem to be no results for finite sample sizes, or a corresponding test of 
, . - . 
Ho , or a direct analogue of the K,, test. And there does not seem to be a single 
example where the limiting distribution of C), is known in a reasonable analytic 
form. 


7. Power of the tests. In the research quoted thus far, the principal effort has 
been to obtain distributions and limiting distributions under the null hypotheses, 
with occasional fleeting and unsystematic remarks on the power of the tests. 
This important facet of the problem has only lately been studied and the results 
are still quite fragmentary concerning the optimum choice and relative power of 
the tests. 

The choices of the weight functions (3.8), (4.12), etc., were made on more or 
less intuitive grounds to maximize the power of the tests against a rather vaguely 
defined class of alternatives; and indeed not only for the present tests but with 
other related distribution free tests (Wilcoxon, run, ranking, sign tests, etc.), 
there are fundamental and as-yet-unsolved problems as to delineating the classes 
of alternate hypotheses and of establishing realistic power comparisons. 

Massey (({59], [63]) showed that the K, test was consistent and biased, and he 
gave a lower bound for the power. Birnbaum [9] considered the K,, test of (4.5) 
and a class of alternate hypotheses to (2.7) of the form 


sup (U(x) — F(a)) = 4, 
—-wczcm 
and obtained best possible upper and lower bounds for the power for finite n, 
and for n — «. The power of the D,,, test of (4.3) was compared with the x° 
test [60], and in the case of a normal family with unknown mean and variance, 
the C:, test of (6.2) was found [44] to have considerable power advantage over 
the x test for alternatives to (2.7) of the form 


(7.1) / (U(x) — F(x))’ dU(z) 


o— 


sup | U(x) — F(x) |} 
—n< rw 

For example [44], when the class of alternatives (7.1) is considered for 6 suffi- 
ciently small, the size of the test being <}, if it takes a sample size N for the x” 
test to achieve a minimum power } against all alternatives (7.1), then the C 
test with the same size will need asymptotically only aN“° observations to attain 
the same minimum power. Similar remarks hold for the alternatives (7.2) with 
a parametric extension of the K,, test. 

The asymptotic power of the tests of Ho of (2.7) can be studied by considering, 
e.g., alternatives to (2.7) of the form 
(7.3) U(x) = F(x) + = G(x), 


Vn 
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where G(x) is a specified function, and the merits of the various tests can be 
compared by considering the limiting probabilities with which (2.7) is rejected 
if (7.3) is true; and if the asymptotically most powerful test of (2.7) against (7.3) 
exists (and is known), one has the concept of asymptotic efficiency against the 
sequence of alternatives (7.3). 

In the case of a normal distribution with mean 0 and variance 1, the alterna- 
tives being normal distributions with means 6, variances 1, 6 # 0, the known 
uniformly most powerful unbiased test of (2.7) was compared with the K,, test 
of (4.1) in [54], with the K,, test showing up fairly poorly, as might be expected 
For the W%, test (with y = 1), the limiting distribution of (3.1) when (7.3) is true 
has been found under certain regularity conditions on F(x), G(x) by T. W. 
Anderson’, and is that of 


1 
(7.4) | [x(u) — k(u)}* du, 
0 
where x(u), 0 S uw S 1, is a Gaussian process mean Q, covariance c(s, ¢) of (2.3), 
and k(u) is a certain function depending on F(x) and G(x). The distribution of 
(7.4) can be studied by methods similar to those in Sec. 3. 

Alternatives to H, of (2.8) of the form U(x) = V*(x), k = 2, 3, --- have been 
investigated ({53], [82]) and power comparisons made for a number of tests in- 
cluding the D,,,, test of (4.3). 

For very small sample sizes, the exact distributions of K,, Ky, Dan, Dn 
can be computed by brute force when Hy , Ho are not necessarily true; and there 
has been some recent work of rather special character on their power. If F(x) is 
normal mean 0, variance 1, and U(x) is normal mean y > 0, variance 1, the K, 
test of (4.5), n = 2, 3, 5 has been compared [81] with the classical uniformly 
most powerful test. For U(x), V(x) normal, different means, variance o’, the test. of 
Ho has similarly been investigated: o known [17], o unknown [82], and com- 
parisons have been made with various other distribution-free tests. The A, and 
Dm, tests do not perform exceptionally well, as might be surmised, and for in- 
creasing m, n, their relative power is conjectured [81] to decrease. 

Of course, essentially nothing in the way of an absolute judgement of the 
merits of the tests can be attained by such studies, since the alternatives against 
which the tests described here are supposed to have good power have little re- 
lation to the above alternatives against which the classical tests have maximum 
power. 
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POLYA-TYPE DISTRIBUTIONS, III: ADMISSIBILITY FOR 
MULTI-ACTION PROBLEMS 


By Samve.t KARLIN 
Stanford University 


In the previous publications, [1], [2], and [3], several types of decision prob- 
lems associated with the general two-action problem (e.g., generalized testing 
problems) have been investigated under the condition that the underlying dis- 
tributions have a density p(x, w) with x the observed value, w the state of nature, 
such that p(x, w) is Pélya-type. In this paper we continue the study of the prop- 
erties of best procedures in the case of multiple-action problems and problems of 
estimation. This part of the investigation is concerned principally with some 
detailed statistical queries for the special case when p(x, w) is Pélya-type 2 or in 
more common statistical terminology when p(z, w) possesses a monotone likeli- 
hood ratio. The main problem dealt with in the present manuscript is the ques- 
tion of admissibility of so-called monotone procedures. 

To set up a common language, we summarize the statement of the general 
multi-action decision problem. The n action problem is usually formulated as 
follows: A real random variable X is observed (usually a sufficient statistic) 
whose distribution P(x, w) has the form 


P(z, a) = | plé, w) du(é), 


where the density p(é, w) has a monotone likelihood ratio and the parameter, w, 
describes the state of nature. For a fixed value of one of the arguments P(x, w) 
will be assumed to be a continuously differentiable function of the other argu- 
ment. Throughout our discussion we may assume that w ranges over an interval 
& of real values (for definiteness, let Q = (— «©, )) and that yu is a completely 
additive measure defined over the Borel field of subsets of the real line. 

It is known from the theory of distributions with a monotone likelihood ratio 
that the set of possible observations X, = {2x | p(x, w) > O} form an interval. 
We shall further assume that X, is independent of w. That p(é, w) has a mono- 
tone likelihood ratio (strict) means that p(a; , w:)p(x2 , w.) — p(x, , w2)p(x , wi) 2 
0 (>0) for 1, < a. and w, < we with x; belonging to X and w; in 2. Most of the 
standard densities occurring in statistical practice possess a monotone likelihood 
ratio. This class of densities includes, in particular, the exponential family, the 
non-central /, and the non-central /. The basic property of densities with a 
monotone likelihood ratio useful in our analysis is its variation diminishing na- 
ture. That is, if h(w) changes sign once from, say, non-negative to non-positive 
values, ther 


g(x) | h(w)p(.r, w) dF(e) 
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changes sign at most once. If, in fact, g(z) does change signs, then as x increases, 
the values of g(x) must also change in the direction of non-negative to non- 
positive values. 

There exist n possible actions which a statistician may take. When taking ac- 
tion 7, the loss is assumed to be measured by the function L(7, w) = L,(w), 7 = 1, 

- ,n, where w represents the true state. 

Throughout what follows the loss functions and the densities are assumed to 
satisfy enough smoothness properties to insure the existence of all integrals in- 
volving these quantities, as well as to justify all differentiation arguments. The 
order of the operations of differentiation and integration will be reversed on 
several occasions in the analysis. The assumption of validity for such inter- 
changes is not overly stringent in view of the fact that except in Sects. 5 and 6 
the loss functions are step-functions and the density is continuously differen- 
tiable. 

In addition, it will be assumed henceforth that the loss functions L,(w) satisfy 
appropriate monotonicity assumptions. The precise statement of this is as fol- 
lows: The functions L,(w) — Lj4i(w) (¢ = 1, 2, ---, nm — 1) as functions of w 
have exactly one change of sign and the sets S; = {w| Z,(w) = min; L;(w)} are 
non-degenerate intervals having the additional property that 


where S; < S;4: means that S; lies to the left of S;,, with only the boundary 
points as common members for two successive S;. In the case of such a loss 
structure, we say that the statistical problem has a monotone preference pattern. 
This is to suggest that if the parameter w were known then the various actions 
1, 2, --- up to m are preferred respectively for increasing values of the state of 
nature w, a given action 7 being favored for known w if and only if L,(w) < L,(w) 
for every 7 ¥ 7. The fact that each of the sets S; is a non-degenerate interval im- 
plies the existence of w,,t = 0,1, ---,n, wherew = —& andw = +, such 
that w; < w{4; and action ¢ is definitely preferred for wi; < w < w; . The values 
w; are necessarily the unique change points of L,4;(w) — L,(w). For simplicity of 
exposition we have chosen to make the change points w; distinct although the 
reader may supply appropriate modifications to the argument to extend our 
studies to the case when some of the w; coincide. (See [1] and [2].) 

A randomized decision procedure ¢ for the statistician is described by an 
n-tuple of functions 


ge = (¢i(x), ¢2(x),  @ ¢n(X)), 


¢gi(x) = Oand > ¢:(x) = 1, where ¢;(z) is interpreted as the probability of taking 
action 7 when z is observed. The expected risk becomes 


plw,¢g) = [ pt, w) (> Li(we.(z)) du(x). 


i=] 
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A procedure ¢ is said to be admissible when there exists no other procedure ¢* 
such that p(w, ¢*) < p(w, g) for every w with inequality for at least one value of 
w. In other words, a procedure is admissible if it cannot be improved upon in 
terms of expected risk—independent of the state of nature. Admissibility for a 
given decision procedure is an obvious prerequisite for its use. Hence, it is of 
some significance to be able to characterize all admissible procedures. A pro- 
cedure ¢’ is said to be Bayes with respect to a distribution F(w) (referred to as 
the a priori distribution of the state of nature) if 


= | ole, ¢ ) dF(w) = min / p(w, ¢) dF(w). 
9 


It is readily seen that if ¢’ is unique Bayes with respect to a distribution F(«) 
i.e., ¢’ is the only procedure minimizing p(F, ¢)), then ¢’ is admissible. Conse- 
quently, one method of establishing that a given procedure is admissible is to 
show that it is unique Bayes with respect to some distribution F(w). In numerous 
cases, we shall actually verify this property. 

In the n action problem a procedure ¢ = (¢; , ¢2, --* , Gn) is said to be mono- 
tone if there exist critical numbers, 7» S 2; Xs --- S$z458572,.(%=-—@ 
v, = +) such that 


, 


<2 x 
Je, 3 < Bua 


and randomization may occur at t = 2; , T41, Le., ¢,(z;) = Ay and ¢g;.;(z,) 

1 — A; (O S dA; S 1). A monotone procedure is therefore fully specified by (2; , 
Te, *** y%n-a3 M1, °°*,An-1) provided xz; S z;4, andO S \; S 1 with appropriate 
modifications on the restrictions for \; when allowing for x; = x;., . It was shown 
in [2] that the monotone procedures form a complete class. Moreover, the proof 
of completeness of the preceding reference contained an explicit construction 
which shows how to improve by a monotone procedure any specified non- 
monotone procedure. However, the question of determining when monotone 
procedures are admissible is of greater complexity. In the two-action problem, 
it was shown in [1] under almost negligible restrictions that all monotone pro- 
cedures are admissible. In direct contrast, for the case of a general three-action 
problem the characteristic of admissibility ceases to be a property shared by all 
monotone decision procedures. For a counterexample see [2]. Apparently the 
explicit magnitudes of the loss functions and not only the preference regions have 
a direct influence on whether a procedure is admissible or not. Nevertheless, it 
is possible to characterize a wide class of multi-action monotone decision prob- 
lems for which all monotone procedures are admissible. Consider a collection of 
loss functions L,(w) satisfying 


(1) \Li(w) — Liss(w)| = b;; for w in S; 


i = 1,2,---,n — Landj = 1, ---,m) such that b,; = 0 for every 7 and j, 
bx > Ofork = i, and fori = 1,2, ---,n—1, 
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b;; bu 


| i433 Dist e 


(II) 2 0 

whenever 1 S$ j7 Siandi+ 12k S n. (For instance, if b;; = c > O then (IL) 
is certainly satisfied.) We will show later that for monotone loss functions satis- 
fying these conditions all non-degenerate monotone procedures are admissible. 
In fact, all non-degenerate monotone procedures are found to be Bayes with re- 
spect to suitable finite distributions. 

By allowing the number of possible actions to become infinite, our multi- 
action problem approaches an estimation problem. That is, the estimation prob- 
lem may be viewed both formally and practically as a limit of finite action 
problems. Therefore, aside from interest in itself, the mn action decision problem 
also suggests and leads to consequences about estimation problems. In the case 
of estimation a non-randomized procedure is described by a mapping a(x) of the 
observed value x into the space of actions. The loss function L(a, w) is now a 
function of the action a taken and the state of nature w. For example, (a — w)” 
would correspond to square error where a represents the estimate of w used. 
Similarly, |a — w| is the commonly used loss function measuring absolute error. 

The case where 


bid =~ Lad wes Seem e7 S* 

| +a if wisin S;,7 1+ 1, 

or specifically L;(w) = a|j7 — 7¢| for w in S;, may be considered as the discrete 
analog for n actions of the absolute error loss function. This last example satisfies 
the conditions of (II) so that all non-degenerate monotone procedures are ad- 
missible. In contrast, discrete analogs of the square error loss function do not 
satisfy (II), and it is unknown whether all monotone procedures are admissible. 
In Sec. 2, we shall investigate in detail the discrete absolute error loss functions. 
With the aid of suitable limiting arguments admissibility for some statistical 
procedures of the estimation problem with absolute error loss function will be 
presented in Sec. 4. 

Also, in Sec. 3 we analyze the general monotone loss functions satisfying prop- 
erties (I) and (II). In Sec. 5 we investigate a loss function for which the penalties 
are constant when we underestimate and a loss given by a monotone increasing 
function of the extent of overestimation when we overestimate. Minimal com- 
plete classes of statistical procedures may be fully determined. All monotone 
strategies are in this case admissible. The reader should bear in mind that these 
results are in sharp contrast to the general n action problem where all monotone 
procedures are not necessarily admissible. 

Within a complete class the statistician obviously should not choose an inad- 
missible procedure. The principal result of this paper is the validation of the fact 
that at least for loss functions satisfying (I) and (II) the non-degenerate mono- 
tone strategies are admissible. The remaining deficiency of this theory is that in 


the n-action problem the class of monotone strategies represents an n — 1 param- 
eter family of procedures. Thus the task of choosing a single procedure is still be- 
wildering and some cogent principles are needed to reduce the size of the class 
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In the following article [4], we advance some principles to guide the statistician 
in selecting a specific monotone strategy from the essentially complete class of 
all monotone procedures. 

The study of admissibility concerning the square error loss function will be 
presented in a later publication. 

Finally, I wish to express my indebtedness to Mr. Rupert Miller for his help 
in the preparation of this manuscript. 


1. Some preliminary lemmas. Basic to our study of the question of admissi- 
bility for loss functions satisfying the conditions of (I) and (IT) are the following 
propositions concerning solutions of systems of linear equations of a specific form 
which have special properties. These linear equation results are singled out here 
because of their independent mathematical interest. The reader interested only 
in their statistical relevance may on first. reading pass over their proofs. 

Lemma 1. The system of n homogeneous equations in n + 1 unknowns, n = 2, 


n+l 


(*) Dayz; — Do aya; = 0, 
j=l j=i+l 
where A, the coefficient matrix of size (n K n + 1), satisfies the following properties: 
(i) aj; 2 O for alli, j;a; > Ofori = 1,--- ,n 
(ii) Fort = 1,2,---,n—1, 


| Qij Qik 


20 
| Gear f = Aine | 
forl Sj Siandi+1s k Sn + 1 with strict inequality for some j 
for each k, 
has a unique (except for a multiplicative constant) solution x«° = (x3, , Zn41) 
which has in addition the following properties: 


(a) ay #0,j = l,---,n+1 
(b) sgn z) = sgn 2: = +++ = gon tea. 


An equivalent formulation of Lemma 1 in terms of non-homogeneous linear 
equations is as follows: 
LemMa la. The system of n non-homogeneous equations in n unknowns 


. n 
Dd ay 2; ~~ > A; 2%; = Gyn, 


j=l jmitl 


where then XK (n + 1) matrix A = (a;;) satisfies properties (1) and (ii) of Lemma 1, 
has a unique solution a = (2, --- , 2%) which has in addition the property that 
xr >0,7=1,--- ,n. 

The proof of the equivalence of Lemma | and Lemma la is straightforward 
and will be omitted. 

Proof of Lemma 1a (by induction). Suppose the result is true for n — | non- 
homogeneous equations in nm — 1 unknowns. We prove it is also true for n 
non-homogeneous equations in n unknowns. 
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Consider the first n — 1 equations of the system of n equations. They can be 
written as 


n—l 


(1) i »™ Qij% —Gna—-Qn1=0 1 lessen — 


j=l j=itl 
where x, = a, or as 


n—l 


(2) be Li _ Q;;2%; — a, = 0 i ,ore nl 
j=l j=it+l 

where a; ; = a;;forj = 1,2, ---,n — l,and Qin = Aint + ;.n41 . It is readily 

verified that the matrix A’ = (a;;) satisfies property (ii) provided a = 0. Prop- 

erty (i) is also satisfied so by Lemma la for the case of n — 1 equations inn — 1 

unknowns there exists a unique solution x(a) = (x(a), --- , Zn,-1(a)) to (1) for 

each a 2 Oandz,(a) > 0,7 = 1,---,n— 1. 

Let g(x(a)) = Gnity(a) + +--+ + Gnnitns(a@) + Gnn® — Onnai. For a > 
Gn.n+i/Ann, g(2(a)) > Osince z7,(a) > 0,7 = 1, ---, nm — 1. We assert that for 
a = 0, g(x(0)) is <0. Suppose the contrary; i.e., suppose g(x(0)) = O. If the 
equation 


On—110)(0) + ++ + Gn natn1(0) — Gri nar = 0 


is multiplied by a,,.4: , the equation 
On12%1(0) ae -+* + Ann 1Z,-1(0) — Aain+l = 0 


is multiplied by —a,-1,.4: , and the two equations are then added, the result is 


an- | | An—1 n— An-1 n+ 
n tintt | +, (0) 4. _ + | n—1 ,n—1 n—1 n+l In—-1(0 


Ant On n+l i | An n—1 On n+l 


An—1,1 


But each determinant is non-negative and at least one is strictly positive with 
2z(0) > 0,7 = 1, ---, — 1. This leads to an obvious contradiction. Therefore, 
g(x(0)) < 0. Since g(x(a)) is a continuous function of a, there must exist an 
a > 0 such that 


AniX1( a0) + os + An n-1T n- 1( ao) + Ann — Anni = 0. 


Consequently, one solution to the system of n non-homogeneous equations in 
the n unknowns which has the property that x; > 0,7 = 1, ---, n, isa’ = 
(21(a0), --* , Zn-1(@0), ao). 

To complete the induction proof of the existence of a positive solution we must 
verify that the lemma holds for the case n = 2. This task is reduced to routine 
enumeration of cases with direct use of the hypothesis. 

It remains to establish the uniqueness of the solution. Suppose there exist two 


j 0 0 0\ r eo 0 ) 0 ¢ 
solutions z = (x1, °--,2,) andy = (y,,---, Yn) suchthatz # y. Then 


t n 
(3) DL ai523— Dy azz; = 0, pe dees 


j=l j=itl 


0 0 . Y . ° ° 
for 2; = 4% — ¥i,i = 1, --- , n. Consider the first n — 1 equations. By the in- 
duction hypothesis the solution to this system of n — 1 equations in n unknowns 
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is unique (except for a multiplicative constant) and possesses properties a and b. 
Since z, = e(x} — y?) for c ¥ 0 is the family of solutions, without loss of gen- 
erality it can be assumed that z} — y' > 0,7 = 1, --- , n. But for the nth equa- 
tion this yields 


Gni(X4 — y3) + .-- + a(z, — ys) > 0, 
which contradicts (3). That uniqueness holds for the case n = 2 is easily checked. 
Thus, z; = y;. 
This lemma can be expressed in terms of appropriate subdeterminants as fol- 
lows: 
Corouuary 1. The signs of then +17 X n subdeterminants of the coefficient 


matrix A of the system of equations (*) obtained by deleting successive columns must 
alternate. 


Lemma 2. The system of n + 1 equations in n unknowns 


j-1 n 
(4) a aij Yi — ze ajYi = CG, 


i=] i=j 


where 


(i) aj; 2 Oalli,j;a; > Ofori = 1,--- 
(ii) Forj = 2,--- ,n—1, 
| ay; Ok 541 


20 


aij Gj j+1 | 
forj+1sS iS n,1 Sk S j with strict inequality for some i for each k. 


(iii) A = (a,;) satisfies condition (ii) of Lemma 1. 
(iv) c5 20,7 = 1,---,n+1 


has a solution only if c; = 0,7 = 1, --- ,n + 1, and in this case the only solution 
is the trivial one y; = 0,7 = 1, ---, Mm. 

Proor. Suppose c; > 0 for some j and there exists a solution y° = (y},--- , y) 
to the system of equations (4), which can be written in matrix notation as yA* = 
e.c¢ = (€;, *-* , ny) and A* = (aj;) where ajj = aj;, i < j, and aj; = 
—a;;, 1% 2 j. Since the conditions of Lemma 1 are satisfied for the system of 
equations —A*z = 0, there exists a solution z’, all of whose components are 
positive. Since y°A* = c and A*z® = 0, 


0 < (c, 2°) = (y°A*, 2’) = (y’, A*z’) = (y’, 0) = 0, 


where (a, 8) denotes the inner product of the vectors a and 8, which is a contra- 
diction. Therefore, c; = 0 forj = 1, ---,n+ 1. 


If we omit the first and last columns of the matrix A* and consider the sys- 
tem of equations 


Gan Yn — An—1,n Yn—1 °° * ~Gan Y2 — Anh = 


Gna Yn + Gn-1.9 Yn-1 *** +02 Y2 — G2 = 
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properties (i) and (ii) reduce to the conditions of Lemma 1 for this system of 
: : 0 0 ° 

equations, so that any solution y’ = (y;, --- , yn) to (4) has its components 

of the same sign and unequal to zero. But this is impossible since 


AniYn + On—11Ya-1 + °° + Ant 


will be unequal to zero if y{ > 0 for all 7 or y! < 0 for all 7. Thus, y{ = 0 for 
all 7. 

In closing this section, we remark that an alternative, more complicated 
proof of Lemma 1 was given independently by P. Braumann. 


2. Discrete absolute error loss functions. In the n-action problem under 
consideration in this section, the loss functions L,(w), i = 1, --- , , will have 
the following form: There exist n — 1 values w;, --- , w,—, such that L,(w) = 
e|j — @| for in the interval (w;_,, «|, 7 = 1, ---, n. (By definition w = —« 
and w, = +.) This system of loss functions may be viewed as the discrete 
analog of the absolute error loss function in estimation problems. The loss for 
any action is proportional to the distance from the best action. As n, the num- 
ber of actions, tends to + and |w; — w_,| ~ 0 suitably as n — + © for all 7, 
then L;,(w) + ¢|a— w|, where i, is defined by a € (w:,-1, w:,]. Thus the abso- 
lute error loss function is an appropriate limit of discrete absolute error loss func- 
tions 


The real random variable X will be assumed to be distributed according to 


P(a,w) = [ pt, w) du(é), 


which depends on the real parameter w. p(£, w) will be assumed to possess a strict 
monotone likelihood ratio. The requirement of strictness may be relaxed in 
many cases, but to avoid inessential, tedious details we have preferred to impose 
the slightly stronger assumption of strictness. 

For the type of loss function described above, it will be shown in this section 
that any non-degenerate monotone procedure characterized by n — 1 points 
Zi, 22; , Zn) and n — 1 probabilities AT, «*: , Ada is admissible. As was 
pointed out in the introduction, this is not true for general loss functions when 
ne3 


LEMMA 3. Any non-degenerate monotone procedure is Bayes against a discrete 
a priori distribution F* which concentrates all its probability at n points; each in- 
terval (w,-,, wi],i = 1, --- , m, contains a mass-point of F*, but the location of 
the mass-point in the interval is arbitrary. The non-degenerate monotone procedure 
is uniquely Bayes with respect to F* except for the randomizations d{ , --- , us 

Proor. For any observed z, the a posteriori risk of taking action 7, with respect 
to the a priori distribution F, is 


rit) = K [ L(w)p(x, w) dF(w), 
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ae 


where K = | p(x, w) dF(w). Action i would be preferred to actior 
— 


for those values of x for which 


« 
(5) r(i+1) —7,(4) = K [ [Liss(w) — L(w)| p(x, w) dF(w 
Jw 

is > 0, and « + 1 would be preferred to 7 when 7,(¢ + 1) — r(2) < O. But 
Liss(w) — L,(w) changes sign exactly once and changes from positive to nega- 
tive as w i:creases. Since p(x, w) is strictly Pélya-type 2, by Theorem 3 of [3] 
r-(i + 1) — 7,(z) has at most one change of sign and at most one zero counting 
multiplicities. Furthermore, in the event that 7.(¢ + 1) — 7.(7) does change sign, 
then it must change from positive to negative as x increases [1]. 

Let w; be any arbitrary point in the interval (w?_, , «|, i = 1, n. It is 
a consequence of Lemma 1 that n constants &,, --- , & with &, > 0, 7 = 1, 
can be chosen such that the monotone procedure described by (2; , ; Zant 
AT, °°: , AX-1) is Bayes against the distribution 7* which concentrates proba- 
bility £; at the point a; : 


| [Liss(w) — L,(w)|p(2, w) dF* = c< >, pla, wsé; _ > PAX, wy )E;). 
~ { jit 


j=] 


Indeed, consider the system of equations 


~ wzi of; — DL plxi, wf; = 0, 

j=l j=it+l 
Since p(z, w) has a strict monotone likelihood ratio, the conditions of Lemma | 
are satisfied so there exists a solution to the above system of equations such that 
t; > O for alli and >-?é; = 1. Thus 


oe 


(6) | (isle) — Liw)Ip(a, «) dF *(w) 


e ° ( ‘ t ° ‘ . 
has a zero, its only zero, at. the point z; . Consequently, action 7 is preferred for 


x < az} and action i + 1 is preferred for z > zi . Similarly, action i + 1 is 
preferred to action i + 2 for x < 2¢4,. Since x} < 2x4; , action 7 is preferred to 
action i + 2 for x < x; . Repetition of this argument shows that 7 is preferred 
to all j > i for « < x;. A similar argument shows that 7 is preferred to all ac- 
tions 7 < i for z > x;-;. Thus action 7 is the best action for x ¢ (x?_, , r?). At 
xz = 2x it is immaterial whether action i or i + 1 is taken since 7,(z?) = 7;,,(2°). 
All randomizations between i and i + 1 at x = 2; will produce the same overall 
risk. This implies that the monotone procedure (x}, --: , ®,-1 3; AL, » Neca 
is Bayes against /*. Furthermore, it is the unique Bayes strategy against F* 
(except for randomization allowed for action i and i + 1 at the points z?) since 
a; is the unique zero of (6). 

The significant result of this section which is deduced from Lemma 3 is as 
follows: 
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THEOREM 1. All non-degenerate monotone procedures are admissible. 

Proor. Suppose ¢’ is not admissible. Then there exists a decision procedure 
g* = (gr , +++, o") such that p(w, ¢*) < p(w, ¢’) for all w with strict inequality 
for some w’. Suppose w’ falls in the interval (ws; , wr]. Select n — 1 specific 
parameter points of 2 so that (a, 


: aes 
- > Wa-1, W, What, *°* » Wn) satisfies 
| 0 ® - 

w; € (w;-1, wi],7 = 1, --- 


,h—1,h+ 1, ---,n. Then, by Lemma 3 a discrete 
probability distribution F* can be constructed which has positive probability at 
each of these points and only at these points, and against which ¢’ is Bayes. 
But since p(w; , ¢*) S p(w; , ¢) fori = 1,---,h-1ht+1,-: 


0 


p(w’, ¢*) < p(w’,¢), 


-,nand 


it follows that 


| ole, ¢*) aF*(w) < | p(w, ¢*) dF*o), 


which contradicts the fact that ¢’ is Bayes against F*. Therefore, ¢’ must be ad- 
missible. 

The following lemma strengthens slightly the results of Theorem 1. 

Lemna 4. If ¢* and ¢’ are two non-degenerate monotone procedures and p(w, ¢*) = 


0 


p(w, ¢’), then at = x} ,i = 1,--- ,n — l,andX? = 2} foralli such that u(x) > 0. 

Proor. Since monotone procedures are uniquely Bayes except for randomiza- 
tion at the endpoints of the intervals, p(w, ¢*) = p(w, ¢’) evidently implies that 
,i = 1,---,n — 1. It can be easily verified (cf. Theorem 1 of [2]) that 


oo (n—l ’ i \ 
p(w,¢*) = [ p(x,w)< > [Li(w) Lin(w)l| Sree) — Seta) | awn 


a 
Xj 


i=] j=1 j=l 
n—l 


7. [L,(w) -- vi41(w) p(x; : w)(r? — AP) u(a:) 


i=l 
n—l 


[L{w) — Lis1(@) p(x? , w)n 


t=1 
where n; = (Av — At)u(z}). 
Evaluation of p(w, ¢’) — p(w, ¢*) at n points w, --- , w, which satisfy 
ws € (wir, w4] 
yields the system of equations 


(j-1 n—1 


c4 >» p(xs , wj)n; st a p(x: , Ws): 


\t=1 =) 


>=9Q, 7 £4 
Since p(x, w) has a strict monotone likelihood ratio, the conditions of Lemma 2 
are satisfied, and therefore n; = 0,i = 1, --- ,n — 1. Hence At = Xj 

u(zy) > 0. 


. . . 0 0 . ‘ 
A monotone procedure is said to be degenerate if x; = 2:4: for some 7. Several 

- — ° ° 0 0 0 . 
intervals can be missing as well, in which case 7; = Tj4; = +--+ = i4u for the 
appropriate combinations of 7 and k. Theorem 1 does not extend to the case of 


whenever 
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degenerate procedures; i.e., there exist inadmissible degenerate monotone pro- 
cedures. However, the degenerate monotone procedures do possess analogous 
Bayes properties. 

Lemma 5. If the a priori distribution F concentrates no measure in the interval 

0 0 ) 0. . 7 own 
(wi_y , wi], then x;_, = 2; in the (monotone) Bayes strategy with respect to F. That 
. * . . . . 0 . . 
is, action i is never taken except possibly at the point x; where randomization occurs. 

9° . 0 0 
Proor. Since L,(w) — Lis;(w) = Lia(w) — Lew) for w 2 (wis, @;], 


x 2 
| [Li(w) — Liss(w)|p(a, w) dF(w) = / [Li-u(w) — L(w)}p(z, w) dF(w). 
Lad = 
But since these two integrals are identically equal they must have the same 
zero (if one exists) at some point zx’. Action 7 is preferred to action 7 + 1 for 
x < 2’ and reciprocally for x > 2’. Action t — 1 is preferred to action i for 
x < 2’ and vice versa for x > x’. Combining these two facts, it is seen that 
action i — 1 is preferred to action i + 1 for x < 2’ and vice versa for x > z’. 
Action 7 is preferred nowhere except possibly at the point x’. Thus, 731 = x; = 2’. 
The analog of Lemma 3 for degenerate monotone procedures is as follows: 
Lemma 3a. Any monotone procedure ¢° with k degenerate intervals, 1 < k S 
n — 1, is Bayes with respect to a discrete a priori distribution F* which concentrates 
all its probability at n — k points; each interval ( * ws] corresponding to a non- 
degenerate action interval in the X-space contains a mass-point of F*, but the loca- 
tion of the mass-point in the interval is arbitrary. The monotone procedure is uniquely 
Bayes against F* except for the determination of the randomizations S ne i. 
The proof of this is more elaborate. Nevertheless, since the techniques are 
similar to the preceding, we omit the details. 
As mentioned previously, there exist inadmissible degenerate monotone 
procedures. An example, for which the author is indebted to the referee, can be 
constructed as follows: Let n = 3, and consider the strategy (zy, 12; At, As) 


0 


defined by 2} = x2 = 2°, \} = }, r= 0. If u(x) > 0, then the strategy 


* = * 7~ 
(5, Ses MS 5 Az) 


with 2} = x3 = 2°, \} = 0,AF = 1 constitutes an improvement. 

The following theorem describes some of the properties of degenerate mono- 
tone procedures. 

THEOREM la. Let ¢” be a degenerate monotone procedure for which 


xy “ *o* = 0 = Te+t =--- = Tie+t — "9 <, 20-1 

(1) If (Xie) = 0, ¢° is admissible. 

(2) If (2%) > 0 and ¢* satisfies p(w, ¢*) S p(w, ¢’) for all w, then ¢* is a de- 
generate monotone procedure with the following three properties: 

(a) at = zx, ¢=1,---,n-1; 

(b) AS = At if u(x?) > 0, t=1,---,m—-lawtkt+il,---,n-—1; 

(c) (fos — Abpaa) + 2ALpse-a — Afvaaa) +--+ + RIM, — AF) = O. 





850 SAMUEL KARLIN 


Proor. (1) and (2a) are trivial consequences of Lemma 3a. 
An easy computation verifies (cf. Theorem 1 of [2]) that 


n—1 
0 S pla, ¢ ) — p(w, ¢*) = 2. [Liw) — Liss(w)|p(az, one, 
1 


where 
* ( 
hj )u(xz), 
Evaluation of the last expression at a, , ~*~ , Wi , @igpetts °** y Wn, Qe (w; 
i l io,¥o +k + 1, --- ,n, produces the system of equations 


n—l 


(j- ig ‘| 
ci). o(x:,w;)nr — > P(x; , wn — > px:,wsn: > = 0, 


sigtk+1 


Sasi n—1 , 
c4 > p(x. w;)n; + : 2 p(x; win — z p(x; , w;)n: > > 0, 


totk+1 


J=mtk+i,---,n, 
-,m—lwtkt+ili-+: n— 1, 


and 


” Se 0 S44 o/s. ° * 0 * 

n - 7, 7 = (Nio+ - Mets! T ZAG +ie—1 ~ Matas fe + K(X, — Ais). 
vent 
The conditions of Lemma 2 are satisfied so 
” . 
ni = 0, =1,---,m,mtk +1, 

This implies properties (2b) and (2c). 

The statement of the analogous results when there are several] groups of de- 
generate intervals is left to the reader. 

It is worthwhile giving an explicit statement to the following general result. 

Corotuary |. If the measure pu is atomless, all monotone procedures are admis- 
sible 


3. Some extensions. The results of the previous section concerning admissi- 
bility for multi-action problems extend immediately to a more general type of 
loss function, as indicated in the introduction. Specifically, we suppose that 


1) L(w) C; for ow in § 


that is, the loss is constant when taking action 7 instead of action 7, which was 
preferred. Aside from the usual requirement that the L,(w) give rise to a mono- 


tone preference pattern, the further important assumption is that 
L(w) — Liss(w)| = 6;; 2 0 for win S;, ba: > 0: 2 > fs. 


such that for = 1,2, ---,n — 1, 
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b, by 
(iI 
1 bisa 


t 


whenever 1 Sj Siandi+1sSk En. 

Again we asume that the density p(x, w) possesses a strict monotone likelihood 
ratio. 

The place of the monotone procedures for these loss functions is summarized 
in the following propositions 

Lemma 6. Jf the loss functions satisfy conditions (1) and (11), any non-degenerate 
monotone procedure is Bayes against a discrete a priori distribution F* which con- 
centrates all its probability at n points; each interval (w;., wi], i = 1, ---, n, 
contains a mass-point of F*, but the location of the mass-point in the interval is ar- 
intrary. The non-degenerate monotone procedure is uniquely Bayes with respect to 
F* except for the randomizations i. <** So, 

The proof of this lemma is completely analogous to that of Lemma 3. We 
sketch the argument. Selecting n points w;,7 = 1, --- , m, where w; ¢ (wj_4 , w5], 
we seek to determine: a discrete distribution /*(w) with weights located exclu- 
sively at w; such that 


i) L,(w)\p(a, w) dF*(w) = ‘>, P\2, wb; &; — > P(x, w)b;5&5> 
\ j=l j=it+l 
vanishes only at 2; ,7 = 1, --- , — 1, the eritical values describing the specified 
non-degenerate monotone procedure. The system of equations (7) is exactly of 
the form of (*) of See. 1 with 


a pla; , w;)b, 


The hypotheses (1) and (11) and the fact that p(2, w) possesses a strict monotone 
likelihood ratio immediately imply that the conditions of Lemma 1 are fulfilled. 
Consequently, we may conclude that an F*(w) with the desired properties exists. 
From here on the proof is a paraphrase of that of Lemma 3. 

Paralleling the method of obtaining Theorem 1 from Lemma 3, we deduce 
from Lemma 6: 

THroremM 2. If the loss functions satisfy conditions (1) and (II), all non-de- 
generale monotone procedures are admissible. 

The arguments of Lemma 3a dealing with degenerate monotone procedures 
do not extend directly to this more general loss function. In fact, for this case 
the conditions of IT will be strengthened so that b;; > 0, all 7, 7, and 


b b., 
1’ Bi, = > 0 
Pio 


Lk 


bi. 
is satisfied ior every choice of | S 7 < k S n. Under this more stringent condi- 
tion it is now possible to show that every monotone degenerate procedure is 
Bayes with respect to a finite discrete distribution 7. The method of proof is an 
extension of the ideas of Lemma 5. For example, let us consider the case where 
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a monotone strategy ¢ is specified with critical numbers (2, 22, -*+ , 2n-1) 
such that z;, = 2;,4: and all other z,’s are distinct. We must distinguish two 
cases: (a) If Bj{ = 0 for all values of j and k satisfying 7 < k, then it is possible 
to find a distribution F(w) whose full mass concentrates at n — 1 values w; 
where w; belongs to (w}_, , w;], for all 7 ¥ 7, against which ¢ is Bayes. The values 
w; may be selected arbitrarily provided only that they belong to the appropriate 
intervals. This assertion can be verified along the lines of Lemmas 5 and 3a. (b) 
If for some jo < ko, Bj°x, > 0, then it follows from (II’) that for all 7 < jy 
and k > ko also Bjf > 0. By selecting arbitrarily n values of w; subject only to 
the condition that w; belongs (w_; , #;], an n-point distribution F(w) with weights 
at w; may be constructed so that ¢ is Bayes with respect to F(w). The proof of 
this statement, as in Lemma 3, reduces to an application of Lemma 1. 

We summarize the conclusions of this analysis in the statement of our next 
theorem. 

THEOREM 2a. /f the loss functions satisfy properties (1) and (II’) with b;; > 0 
for all i, j, then any degenerate monotone procedure ¢° is Bayes. ¢’ is uniquely Bayes 
excepl for the randomizations MT, t+ y Ana at a, -** , 2e-1 80 that if ¢° is inad- 
missible the decision procedure ¢* which improves on ¢’ differs from ¢° only in the 
randomizations at ry 6 Ss es . 

Coro.uary 2. If u is atomless, all monotone procedures are admissible. 


4. Estimation with absolute error loss function. In the previous section it was 
mentioned that the absolute error loss function L(a, w) = c | a — w| is the limit- 
ing case of the discrete absolute error loss function in an n-action problem. This 


fact will be utilized in this section to prove that all bounded, continuous, mono- 
tone estimates in the estimation problem for absolute error loss functions are 
admissible. 

Since the loss function L(a, w) = ¢| a — | is a convex function of a for each 
w, it is only necessary to consider non-randomized estimates, i.e., single-valued 
functions a(x) which map the space X into the space 2. It was shown in [2] 
that the class of monotone estimates is essentially complete when the loss is 
absolute error. (An estimate is monotone if 2; < 2x2 implies a(x,) S a(zrz).) This 
result will be strengthened by showing that all bounded, continuous, monotone 
estimates are in addition admissible. It would appear that this is about as general 
a result for admissibility as can be obtained since it is not true in general that 
arbitrary unbounded monotone estimates are admissible. Consider the problem 
of estimating the parameter w when x is normally distributed with mean w and 
variance 1. The estimate a(x) = x + k for constant k ~ 0 is a monotone esti- 
mate, but it is strictly dominated by the estimate ao(z) = x. The investigation 
of the admissibility of certain natural unbounded monotone estimates on absolute 
error loss functions is deferred to a subsequent publication. 

THEOREM 3. All bounded, continuous, monotone estimates are admissible. 

Proor. Let a(x) be a monotone and continuous estimate for which 
lim,.—.d@o(z) = —b; and lim,.<ao(z) = bo, where —b; < b& and bi, be < ~. 
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It will be shown that a is the unique Bayes estimate with respect to some a priori 
distribution /* and therefore is admissible. The problem is first reduced to the 
finite action case with loss functions corresponding to discrete absolute error 
functions. 

To this end, let the Q-space, the real line, be divided into 2N + 3 half-open 
° Ny - ‘ ‘ yo 
intervals (w;-; , w: |, = 1,2, --- ,2N + 3, where 

Wo = i, 

b a be 

a 


Hs 
Wen+3 
and consider the (2N + 3)-action problem which is defined by 


N b; be 
Li(w) = 2+ 


for w € (wi_4, w' I, 


. . & > > ols vy 
- ,2N + 3. Define the discrete decision procedure ap (x) as follows: 
z<--—-N 


1 ‘ i+ 1 
—N on sc —N Sage 
+ V S2<. + N 
b; + be 


2N 


where | dp (- a =) +b, — (A; — 2) 


ryt b, + bel 
ossageee Ps (- + i) ta -Y<d = SN | 
2N +2 oe h. 


; a . 
ao should be interpreted as a monotone decision procedure which for each value 


of « specifies that the action ao (x) should be taken. Since a) is monotone and 
never involves taking actions 1 or 2N + 3, by Lemma 3a aj is Bayes against a 
discrete probability distribution Fy whose spectrum is contained in the interval 
(—b,; — (db; + be)/2N, be]. Since the spectrum of Fy is contained in a finite in- 
terval and as N — « each interval becomes a subset of the previous interval, 


al L . 
the sequence {F,,} has a subsequence {F,,} such that F,, ———+ F*, where F* 


is a distribution function. Without loss of generality, assume Fy ———+ F* as 


, . y N ' ° 
N — «. It is alsoclear that as N — © Loy 2)(w) — | ao(x) — w| uniformly for 
w €[{[—b; , bo]. Therefore, since p(x, w) is continuous in w for each r, as N — ~, 


| Live (w)p(a, w) dF y(w) > [ | ao(x) — w| p(z, w) dF*(w). 
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Consider any other estimate a(z) whose range is contained in the interval 
{[—b, , be]. Without loss of generality a(x) is monotone and for simplicity assume 
a(x) is continuous. At a point of discontinuity a(x) can be approximated point- 
wise by a continuous estimate. A limit argument gives the general case. Define 
a‘ (x) analogously to aj (x). Since a} is Bayes against F’y , 

ot 0 
| Lax @)(w) p(x, w) dFy(w) S Las »(w) p(x, w) dF y(w). 
- sie 
As N — «, Lin,2(w) — |a(xz) — w uniformly so 
- 0 
a(x) — w| p(x, w) dF*(w) S | a(x) — w| p(a, w) dF*(w) 
Jon Lice 
Thus, a» is Bayes against ’* when the class of possible estimates is restricted to 
those whose ranges are contained in |—), , b:]. But any estimate b(z) which as- 
sumes values outside the interval [—, , be] can obviously be improved upon by 
an estimate whose range is contained in |—b; , be] since the spectrum of F’* is 
contained in |—b, , be}. Therefore, ap is Bayes against F*. It is also clear that the 
spectrum of F* comes arbitrarily close to the extreme values —b,; and b, . This 
fact is utilized below in the discussion of admissibility. 

To prove admissibility it must be shown that ao is uniquely Bayes against F*. 

For a fixed x in the positive sample space 


p(F*, a) = [ a-;-w p(a, w) dF*(w) 


~o 


is a convex function of a. If it can be shown that p(F*, a) is strictly convex over 
[—b, , be], then the minimum will be unique. For a; , az ¢ [—b, , be], a, < a2, and 
0<A <1, 


p(F*, ka, + (1 — A)ae) = | A(a, — w) + (1 — A)(e — w) | p(z, w) dF*(w). 
\A(a, — w) + (1 — A)(@ — w)| S Ala — w + (1 — A)| & — @| with strict 
inequality for a; < w < a. Thus, 


p(F*, Xa, + (1 — A)ae) < Ap(F*, ai) + (1 — A)p(F*, az) 


will follow if F* assigns positive measure to every open set in the inter- 
val | — b; , be]. Hence, it suffices to show that /’* has positive measure throughout 
the whole interval. Suppose the contrary; there exist constants, bs; and b, , such 
that —b; < bs < by < bo and 0 < F*¥(b,) F*(b, + 0) < 1. Fora e [hz , bs), 


ob x 


p(F*, a) = | (a — w)p(x, w) dF*(w) + | w- a)p(x, w) dF*(w 
dene Jb 


4 


~o 


ab 
a | | p(x, w) dF*(w) — P(x, w) are) | + K(z, w) 
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where K(x, w) is a function of x and w independent of a. If the expression in 
brackets is positive, miny,<a<»,e(F*, a) = p(F*, bs), and if the expression is nega- 
tive, mine,<a<»,e(F*, a) = p(F*, by). Thus, ao(x) can assume values between b. 
and b, only if the expression in brackets is zero. For each x in the non-degenerate 
interval {a:b; < ao(x) < by}, we must have 


ebs + 0 +X 


| p(x, w) dF*(w) — | p(x, w) dF*(w) = | h(w) p(x, w) dF*(w) = 0, 


Jo, 
where 
| »S db 
h(w) = 7 

—] w > b; 
But since h(w) changes sign exactly once, [*..h(w)p(2, w) dF*(w) has at most one 
zero by Theorem 3 of [3] and cannot equal zero for an interval of z’s. Thus, F* 
must assign measure to every open interval in [—, , b:] and the theorem is proved 
for —b; < he. 

It is a trivial verification that all estimates of the form a(x) = c, c constant, 
are admissible. (In fact, it is the unique Bayes strategy with respect to the dis- 
tribution concentrating all its probability at the point w = c.) 

This completes the proof of the theorem. 


5. n-action problem for a special loss function. Consider the n-action problem 
which is defined by the n + 1 values a» , wr, --: , ei(w = —* andw, = +2) 
and the following set of loss functions: 


( Vi(w) w S wir 
0 w € (win, wi 

c>0 w > w; 
where y,(w) is a monotone decreasing function of w, i = 1, --- , n, and where 
¥i(w) — ¥;(w) > O for i > 7 and w in their common domain of definition. (It 
will be assumed that y,(w) is sufficiently smooth to justify differentiation inside 
the integral sign of the a posteriori risk function.) The loss for an action 7 is 
constant if the corresponding w-interval is underestimated and increases as the 
magnitude of the error increases if the w-interval is overestimated. Such a 
family of loss functions is suggested by the following problem. It is desired to 
determine how much material will be required to construct a bridge across a 
certain river. If insufficient material is ordered, the bridge will not be completed 
and the loss is the same regardless of how small or large the discrepancy is. On 
the other hand, if there is an overabundance of material, the loss is proportional 
to the amount of excess, wasted material. 

The assumptions concerning the distribution functions 


P(x, w) = [ p(é, w) du(é), 


Gam 


remain the same as in Sec. 2. 
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The main result of this investigation is embodied in Theorem 4 given below. 
The proof is quite similar to the proof of Theorem 1. 

Lemma 7. A monotone procedure (non-degenerate or degenerate) is Bayes with 
respect to a discrete a priori distribution F* which concentrates its probability at n 
points; each interval (wha ; ws], i = 1, ---,m, contains a mass-point of F*, but 
the location of the mass-point in the interval is arbitrary. The non-degenerate mono- 
tone procedure is uniquely Bayes with respect of F* except for the randomizations 
iS sa -s. 


Proor. Consider the difference of the a posteriori risks for actions 7 + 1 and 7: 


7,2 + 1) — 7.(i) = / [Liss(w) — L,(w)|p(a, w) dF(w), 


where F is some a priori distribution. When 7,(i + 1) — 7.(7) < 0, action i + | 
is preferred to action 7, and when 7,(i + 1) — 7.(7) > 0, action 7 is preferred 
to action 7 + 1. Since Lj4:(w) — L;(w) changes sign once, from positive to nega- 
tive as w increases, by Theorem 3 of [3] 7.(7 + 1) — 7.(7) has at most one zero 
counting multiplicities, and if it changes sign once as x increases, it changes 
from positive to negative. Thus, there exists an xz’ such that for x < 2’ action i 
is preferred to 7 + 1 and for x > 2’ action 7 + 1 is preferred to action 7. 
A monotone decision procedure characterized by (2} , --- 
will thus be Bayes against F if the system of equations 
- 


| (Lissw) — Liw)|p (a? , w) dF(w) = 0, 


-,n — 1, is satisfied. Let w; be an arbitrary point in the interval 
(wi-1, wil, (= l, waedeee 
We assert that weights f;, f; > 0, off; = 1, can be determined such that if 


F* is the distribution which assigns probability f; to the point w,;, then 


0 0 0 0 . 
(X35 °°* » Za—a; Any °°* » An—s) 


is Bayes against F*. This can be seen as follows. The system of equations (8) 
becomes 
t+1 


(9) 2 (Lis(ws) — Lio) pi, wifi = 0, 

I= 
t= 1,---,n— 1. The (n — 1) X n coefficient matrix A = (a;;) of the system 
(9) has the form: 


(l)a;>0O for js 
(2) Oy i+1 <0O for 2 
(3) a; = 0 for 7 >72+1, 


But any such system of equations has a solution (f, , --- , f,) such that f; > 0, 
Yh = 1. Consider the first equation, 
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anf; — | ay fo = Q). 


Choose any positive value whatsoever for f; , and solve for f. . Clearly, fo > 0. 
Substitute these values of f; and f, into 


auf; + An fe — | a3 fs = 0 


and solve for f; . Clearly, f; > 0, and so on. The solution (f,;, --- , f,) can be 
normalized so that >> f; = 1. Thus, (a3, ---, ta-a3M1,°°', °,-1) is Bayes 
against F*. 

The uniqueness except for A}, --- , A,-1 is established by the fact that the 
zeros ol 


se 


| [Liss(w) — Li(w)]p(2, w) dF*(w), 


1=1,---,n — 1, are unique. 

THEOREM 4. All monotone procedures are admissible. 

Since the proof of this theorem duplicates that of Theorem 1, it will be omitted. 
Lemma 8 below is the analog of Lemma 4. It strengthens slightly the results of 
Theorem 4. 

Lemma 8. If ¢* and ¢’ are two monotone procedures and p(w, ¢*) = p(w, ¢), 
then x? = x ,t=1,---,n—landdX; = r° for all i for which (ry) > 0. 

Proor. Since ¢* and ¢’ are uniquely Bayes except for randomizations, 
p(w, ¢*) = plw,¢) trivially implies that 27 = x yt=1,---,n—]. 

As in Lemma 4, 


« n—1 i 


p(x, w) « 2d [Li(w) — Liste) | , ¢3(2) - > eo) |} dy(x) 


j= ) 


p(w, ¢) — plw, ¢*) 


a 


and 
n—1 


0 = ps [L:(w) — Li1(w)|p(ai , w)n; 


i=] 


where 


n= > (a5 — AP)u(at), «= Ks = «(|g S 6, 25 = 2}. 
xK 


When this expression is evaluated at w; , --- ,#, , the system of equations 7A = 0 
is produced where 7» = (m, -** , m—1) and A = (a;;) isan (n — 1) X nm matrix 
satisfying 

(1) aj; <0 for j i, z ---,n— I], 

(2) aiiann > O, i es. 

(3) a; =0 for 7 >i+4+1, --- n— 1, 


But the only solution to such a system of equations is 7 = (0, 0, --- , 0). The 
last equation @n-1,...-1 = 0 implies n,_; = 0. If n,_, = 0 is substituted into the 
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next to last equation, the resulting equation @,_;,,-1:0 + @n—2,.-19.-2 = 0 implies 
Nn-2 = 0, and so on. But n; = 0,7 = 1, --- ,m — 1 implies the stated result. 


6. Estimation with the special loss function of Section 5. If w ¢ © is the true 
parameter point and w is estimated to be a, then the loss is L(w, a) where 


‘fe >0 a<wa 
L(w, a) = 4 
\W(a — w) aZzw 


where ¥(£) is a monotone increasing function of — with ¥(0) = 0. 

The method of proof of Theorem 5 below closely parallels that of the first part 
of Theorem 3. Unlike the absolute error loss function, this loss function does not 
admit an easy proof of the uniqueness of the Bayes estimate which is required 
for this type of admissibility proof. In fact, the indications are that uniqueness 
fails to hold, but no proof of the inadmissibility of a bounded, continuous, mono- 
tone estimate has been obtained as yet. However, a weaker positive result in 
this direction is the following: 

THEOREM 5. All bounded, continuous, monotone estimates are Bayes estimates. 

The proof is omitted. 

Finally, we remark that the same kind of results can be obtained for the case 
where the loss functions are such that the error is constant for overestimation 
and arbitrarily monotonically increasing for underestimation. 


7. Minimax results for the discrete absolute error loss function. For the n- 
action problem with discrete absolute error loss function it is not true, in general, 
that min, max,y p(F, ¢) = max, min, p(F, ¢), although inf, supe p(F, ¢) = 
supe inf, p(F, ¢) provided L,(w) = 0 and the value of the game is allowed to be 
infinite [5]. Most often the game fails to have the property that the F player has 
a minimax strategy. The difficulties stem from two sources. The space {F} 
of all probability distributions on the real line is not compact, and the loss func- 
tions are discontinuous at w; , i = 1, --- ,m — 1. However, the following result 
is true. Suppose © consists of a finite number of points {w,, --- , wy} where 
ns Nandy; < wj41,1 = 1,---,N — land L(w;,j) = ¢|@ — jj for w; € Sy. 
For this game structure it is very easy to establish (ef. [3]) that 


max min p(F, ¢) = min max p(F, ¢), 
F ¢ ¢ KF 


where {/'} is the space of all discrete probability distributions defined on ©. 

It will be assumed that all the previous assumptions of Sec. 2 apply equally 
well here. 

Since the class of monotone procedures is essentially complete, there exists at 
least one monotone minimax strategy. The character of the monotone strategy 
has been well-defined, but nothing has been said as yet about the structure of 
nature’s minimax strategies. The following lemmas have this as their aim. 

Lema 9. Let v be the value of the game, and let ¢° be a monotone minimax strategy. 
If T,, = {w\plw, ¢) = v}, then T,, contains points in each region 
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S; C Q,i = 1, ---, n, for which ¢° has a corresponding non-degenerate interval 
in X in which action i is taken. 

Proor. Suppose that for some z, 7',, n S; = @ (the empty set). Then a mini- 
max strategy F° for nature will not concentrate any probability in the region S; . 
Since for all ¢, p(F’, ¢) = p(F’, ¢°) = v, ¢’ is Bayes with respect to F°. By Lemma 
5, ¢ cannot have a non-degenerate interval for action 7. 

Lemma 10. If the monotone minimax strategy ¢° involves k non-degenerate in- 
tervals, there exists a minimax strategy F° for nature whose spectrum consists of k 
points, each point belonging to a region S, for which the action i interval in X is 
non-degenerate. 

Proor. By Lemma 9, in each region S; for which action 7 has a non-degenerate 
interval there exists at least one w ¢ 2 such that p(w, ¢’) = v. Choose one such 
point from each of the eligible S;. By Lemma 3a, it is possible to construct a 
distribution F° concentrating its probability at these points with respect to 
which ¢’ is Bayes. Since for all ¢, p(F’, ¢) = p(F’, ¢’) = v, F’ is minimax. 

When n = 3 and p(x, w) is Pélya-type 3, with continuous second derivatives, 
there is a constructive method of obtaining the monotone minimax strategy 
Define 


A;(w) = ¢ p(x, w) du(x) + 2c p(x, w) du(x) + c(1 — Adp(a, w(x) 


+ [ere + 2c(1 — As) ]p(ae, w)u(z2), 


J—2 


2 x 
Adw) =e / p(x, w) du(x) + ¢ [ p(x, w) du(x) + cry p(a, w) p(x) 


+ c(I ho) p(x2, w) u(x), 


er ez 


A;(w) = 2c | | p(x, w) du(x) + ¢ pla, w) du(x) 


- 2} 


+ [2ed. + e(1 — Adlai , wpa) + crs plas , w)y(x2). 
Let ¢ be the monotone procedure characterized by (x , 22 ; X41, Az). For w € S;, 
p(w, ¢) = A,(w); for w € Se, p(w, ¢) = Ao(w); and for w eS; , p(w,g) = As(w). Let 
wi, = max {a | @; € S;}, Wig = min {w; | a; € Se}, 
w;, = max {a;| w; € So}, w;, = min {w; | w, € S;} 
Since p(z, w) is Pélya-type 3, A,(w) is a monotone increasing, function of w, A;(w) 


is a monotone decreasing function of w, and As(w) — a as a function of w has at 
most two changes of sign (for any a). Thus 


max p(w, ¢) = p(w;, ,¢) = A1r(w;,) 


wye8) 


max p(w,¢) = max {p(w;, , ¢), p(wi, , ¢)} 


wo, e8o 


= max {Ao(w,;,), Ac(w;,)} 
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max p(w, ¢) = p(wi,,¢) = As(wi,). 


w,e8 


To obtain the monotone minimax strategy choose (x; , 22 ; 41, A2) such that 


(10) p(wi, , ¢) = plwi,, ¢) = plwi,, ¢) 2 plwi, , ¢), 
or, if this is impossible, choose (x; , 22 ; X41, Az) such that 
(11) p(w, ,¢) = p(w, ,¢) = plwi,,¢) = plwi, , ¢). 


It is clear that the monotone strategy (a , Z2 ; 1, Ax) Where 2, %2, AL, Ae: 
defined by (10) or (11) is a minimax strategy. 

Either the system (10) or the system (11) has a solution since the statistician 
has a minimax strategy and this strategy must involve three non-degenerate 
intervals. The latter statement is proved as follows: 

Lemma 11. v < ¢. 

Proor. Consider the strategy ¢ = (¢; , gz, ¢3) where ¢;(x) = 4,7 
p(w, ¢) S ¢ for all w. Therefore, v < c. But there exists a monotone strategy ¢ 
which improves uniformly on ¢ by Theorem 1 of [2]. Therefore, v < c. 

Now consider the various cases for which there are at most two non-degenerate 
intervals. 


Case 1. x2 = =: p(w,g) 2c for we S; 
Case 2. 2%} = ™: p(w,¢g) =c for wes 
Case 3. x, = : p(w,g) 2c for we Sy 


Therefore, a minimax strategy for nature must involve three non-degenerate 
intervals. 
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SADDLE-POINT METHODS FOR THE MULTINOMIAL 
DISTRIBUTION' 


By I. J. Goon 


1. Summary. Many problems in the theory of probability and statistics can 
be solved by evaluating coefficients in generating function, or, for continuous 
differentiable distributions, by an analogous process with Laplace or Fourier 
transforms. As pointed out for example by H. E. Daniels [2], these problems can 
often be solved by asymptotic series derived by the saddle-point method from 
integrals containing a large parameter. Daniels gave a form of saddle-point theo- 
rem that is convenient for applications to probability and statistics. In the pres- 
ent paper we extend the theorem in various directions and give some applica- 
tions to distributions connected with the multinomial distribution, especially 
to the distribution of x? and to the distribution of the maximum entry in a multi- 
nomial distribution. 


2. Introduction. The use of asymptotic formulae in practical statistics has, 
historically, been partly experimental. As much regard has been paid to numeri- 
cal examples as to analytical bounds on the errors. Analytical bounds have a 
habit of being much larger than life, and are often of less practical value than 
the second term of an asymptotic expansion. If the second term is small then we 
can be happier about relying on the first term. If it is not small then we learn 
even more, and we also become interested in finding the third term. There is 
therefore a definite need in statistics for two-term and three-term asymptotic 
expansions. 

In Sec. 6 we give three rather general theorems about asymptotic expansions 
of integrals, double integrals, and multidimensional integrals, in a form conven- 
ient for statistical applications. These theorems are adequately motivated by 
the earlier sections. In Sec. 3 we make some preliminary remarks about the 
multinomial distribution and tests for it. In Sec. 4 we give some examples to show 
how generating functions arise for these tests. In Sec. 5 we give brief descriptions 
of continuous and discrete methods of extracting coefficients from generating 
functions, and the continuous method is elaborated in Sec. 6, where the general 
theorems on asymptotic expansions are given. These theorems are applied in 
Secs. 7 and 8 to the distribution of the maximum entry and to that of x? for a 
multinomial distribution. (When we refer to x? we mean the statistic for testing 
goodness of fit, not the gamma-variate.) In Secs. 9 and 10 we give some examples 
of the discrete method of extracting coefficients and some combinatorial formu- 
lae for the distribution of the maximum entry. 
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3. Significance tests for multinomial distributions. Multinomial distributions 
arise, for example, in the following problems. 

(a) Testing the goodness of fit of observations to a theoretical continuous dis- 
tribution. 

(b) Testing whether a sequence of non-negative integral variables arose from 
independent observations of a Poisson variable of unknown mean. (See, for ex- 
ample, Rao and Chakravarti [21] or Hoel [10], page 198.) 

(ec) Testing whether digits are adequate as random sampling numbers. 

In all three of these applications the equiprobable multinomial distribution 
(with all cell probabilities equal) is of special interest and therefore, in the pres- 
ent paper, more attention will be given to this case than to the general multi- 
nomial distribution. Furthermore the theory is simpler for the special case. 

Let our multinomial distribution have ¢ categories, sample size NV, and sample 
my, M2, -** , M_, where 


mtnmtess tm = QN. 


Let the null hypothesis be that the cell probabilities are p, , pp, --- , p,, the 
most interesting case being when p; = po = --: = p, = IL/t. 

The null hypothesis may be tested by any of the following tests among others. 
Which tests are appropriate will depend largely on what non-null hypotheses 
are judged to have appreciable initial probabilities. 

(A) For the non-null hypothesis we might assume that the cell probabilities 
are qi, @2,°°* , Ge Where gq, g2,--* , Gg are unknown but are assumed to have 
what I call a “type II” initial probability density in the simplex (generalised 
tetrahedron) gq: + g + -:: + q: = 1. (By a type II distribution I mean one 
obtained when sampling from a superpopulation in order to determine an ordi- 
nary population.) We should then, by Bayes’s theorem, arrive at a factor in 
favour of the non-null hypothesis provided by the observations nm, mz, --- , 
n,. (For the terminology and for the theorem of the weighted average of factors 
see Good [3], Chapter 6.) This factor would be the weighted average of 


Pr Pr 


with weights proportional to the initial probability density. For example, if the 
initial probability density is taken as proportional to (pip. --- p:)“(a> — 1) 
(cf. Perks [20], Good [4], or, for the uniform distribution a = 0, Lidstone [18] 
and Jeffreys [13]), the factor in favour of the non-null hypothesis turns out to be 


(ta + t — 1)!]] (n, + a)! 
r=1 
(a!)‘'(N + ta+t— 1)! II p? : 


F(a) = 


In accordance with what I called the ‘“Bayes/non-Bayes synthesis’ in lec- 
“ ° “1. : er ' 
tures in Princeton and Chicago in 1955, we could, for some guessed value of a, 
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regard F(a) simply as a statistic and work with its distribution, given only the 
null hypothesis. [The Bayes/non-Bayes synthesis is the following technique for 
synthesizing subjective and objective methods in statistics. (i) We use the neo/ 
Bayes-Laplace philosophy in order to arrive at a factor, F, in favour of the non- 
null hypothesis. For the particular case of discrimination between two simple 
statistical hypotheses, the factor in favour of a hypothesis is equal to the like- 
lihood ratio, but not in general. The neo/Bayes-Laplace philosophy usually 
works with inequalities between probabilities, but for definiteness we here as- 
sume that the initial distributions are taken as precise, though not necessarily 
uniform. (ii) We then use F as a statistic and try to obtain its distribution on 
the null hypothesis, and work with its tail-area probability, P. (iii) Finally we 


look to see if F lies in the range 
30P’ 10P/° 


If it does not lie in this range we think again.] 
(B) The likelihood-ratio test. (See, for example, Wilks [26].) Minus twice the 
logarithm of the likelihood ratio is 
u = 2>. n, log.n, — 2>> n, log.p, — 2N log.N, 


which has, asymptotically, a gamma-variate distribution with ¢ — 1 degrees 
of freedom. For the equiprobable case 


u = 2>_n, log, n, + 2N log. t — 2N log. N. 


(C) The chi-squared test (for numerous references see, for example, the Index 
of M. G. Kendall [16], Vol. IT), 


x = D (n, — Np,)*/(Np,). 


x arises as an approximation either to 4 or to a constant plus 2 log F(a), what- 
ever the value of a. In fact, for any initial distribution with positive density 
at (pi, Pe, °** , Pe), the log-factor (“weight of evidence’’) in favour of the non- 
null hypothesis, when the null hypothesis is true, is asymptotically of the form 


4x’ — K, 


where K depends only on N, t, p: , --- , p, and not further on the sampling fre- 
quencies m, %,-*: , m. This, to a neo/Bayes-Laplacian, is the real justifica- 
tion for the use of x’. A similar argument applies to ..e use of x’ for testing 
absence of association in contingency tables. 

Among the advantages and disadvantages of » as compared with x’ are (i) u 
more closely puts the possible samples (for given N, t, pi: , pe, -*- , pe) in order 
of their likelihoods on the null hypothesis, (ii) when tables of 2n log.n are avail- 
able, the calculation of u can be done by additions, subtractions, and table- 
lookups only, but the calculation is less “well-conditioned” than for x’, in the 
sense that more significant figures must be held, (iii) x’ is a simpler mathemati- 
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cal function of the observations and it should be easier to approximate closely 
to its distribution, given the null hypothesis. (See Sec. 8, where a method is 
discussed for improving on the usual gamma-variate approximation.) 

(D) Number of zero entries. Sometimes the (possibly vague) non-null hypothe- 
sis has a lot of type II probability density close to regions where several q,’s 
vanish. In this case a reasonable statistics is the number of zero n,’s. The prob- 
ability that the number of zero n,’s will be exactly s is 


us 


La -@g* -Ld-qt+p)*+ Dil—-q+p,+p)* - 


\ # 


where 


9 = Pr + Bry tes + Pr, 


and where the outer summation is over all unequal values of m, T2, --- 
For the equiprobable null hypothesis the above probability reduces to 


(C2) - ME + 1} 


and is discussed in some detail by Rao and Chakravarti [21]. 
(KE) Maximum entry, 
max n, . 
r 

This statistic would have some application to some work of Guttman [8], as 
pointed out by Greenwood and Glasgow [7]. The latter authors considered the 
distribution of both maximum and minimum entries in a binomial distribution 
and discussed also a special trinomial example. Their methods and results 
hardly overlap with ours. The distribution of the statistic max, n, can be ob- 
tained by means of the saddle-point method and is discussed further in Sees. 
4, 7, and 9. 


4. Some generating functions, mainly related to the multinomial distribution. 
4.1. Maximum and minimum entries. Let P(all n, S m) be denoted by 


P(m\N, t), 


which of course depends on m, N, t, pi, P2, +++ , Pe. For the equiprobable hy- 
pothesis p; = p, = --- = p, we denote the probability by Po(m | N, f), a func- 
tion of m, N, and ¢ only. Then it can be at once verified that (for all x) 


2 N 


(4.1) > = P(m|N,d = II (1 + Pro+--> + & stipe 
r=] 1 


v=o V! m! 


and in particular that 
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‘ = t* a" . x z”\" 
(4.2) ya —— Po(m N,t) =(1+2+4 Rte te). 
v=o N! 2! m. 

Similarly the probability that the minimum entry is at least m is the coefficient 
of x” in 

t / m m+1 
ial ; »s 
(4.3) NI] (Pr + 

r=] m. 


or, for the equiprobable multinomial distribution, in 


Tt ,m m+l t 
(4.4) w+ eat +), 
i” \m! (m + 1)! 


The above four generating functions are simple generalisations of the one in 
Proposition XXIII of Whitworth [25]. 

4.2. Probability that all n, are even or all are odd. It may be noticed, for its 
entertainment, that the probability that all the n,’s are even, for an equiprob- 
able multinomial distribution, is equal to the coefficient of x* in 


Ni/ x Pa : 
im (V+54+5+-- ; 


i” 


so the probability is 

-t | t 1 
eid + a 
res (:)(: t + (. 


if ¢ is large and N is even. Similarly the probability that all n,’s are odd is 


{1 (Me-2) +(e = eure 2 


/ 


oN-t I N 
a (5") 
—2N/ty\t 


and is approximately equal to 2°‘""(1 — e ) if ¢ is large and N = ¢(mod 2). 
4.3 Chi-squared for the equiprobable multinomial distribution. We write, as is 
customary, 


w2ty oe a 


x = > (n, — N/t)*/(N/2) 


r=1 
and we have 
2 & 


(4.5) + N —_ N, 
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where S = >-‘_, n?, so that the problem of the distribution of x’ is essentially 
the same as that of S. Now we can at once verify that, M being an integer, 


(4.6) 2 : a | S=M) = (x ey 


n=0 7. 


In the hope that they may suggest to the reader some improvements in the 
analysis of this paper, we mention a few facts about the function 


F(z, y) = E <i v 


n~=0 
We have 


"(e®, e”) i *F(e§, 
ts 


ok (2y)" cos” (n8)/n!, 


n=l 


x 


“= r 

Finally we mention that the function F(e"'Y’, y) is discussed by Nassif [19} 
and Tims [23]. 

4.4. Chi-squared for a contingency table. For an r by s contingency table 
inss}(s = 1,2, --- , 737 = 1,2, --- , 8; Doi. 5 Ms = N, > oF Nig = 1., > Niji = 
n,;), the probability of the table, given the borders, and assuming no associa- 
tion, is, as is well known, 


2 
u* 
_ ——t+yeiuv? 


€ 2a * du, (Ra > 0). 


Il n;.!n.;! 

Tees 

Ni II nj! 
ij 


— nj. N.j/ (NY i 
ae B= ». 
n: 7 TN = NG ° 


where 
naj 
7 nin; 


The problem of the distribution of x’ is equivalent to that of S, 
Pr (S = M) = coefficent of 2” [J x?*y?" in 
i,j 


(4.7) st pan mB artl(ms.n.3 
oe eee 


1.3 n=O 
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The moment generating function of S is therefore the coefficient of 1G pap yz? 
in 

I] mtn! ae nj) 

hh ae 

5. Methods of evaluating coefficients in generating functions. 

5.1. Continuous methods. A coefficient in a power series may be expressed as a 
contour integral by means of Cauchy’s formula, and then this integral may be 
expanded into an asymptotic series by a saddle-point method. In the present 
paper examples are given in Secs. 7 and 8. For continuous random variables 
with probability densities the analogous process is the use of the Fourier or La- 
= e transform instead of Cauchy’s integral. 

2. Discrete methods. Let 


e 
né 
g(@) = Do enre™. 
r= x 
Then if u is a positive integer we have 


(5.1) is. ot) = 3 Cus - 


u r=Q n=— oo 


In particular, if c, = 0 whenever |n| 2 m, then 


. * : u—l 9 
(5.2) = [ 90) dd = =- Dig (= a) 


u 


- —F r=0 


whenever u 2 m. (Compare D, G, Kendall [14], and Good [5].) Given a generat- 
ing function that happens to be a polynomial, h(x), we can extract the coeffi- 
cient of 2’ by using (5.2) with g(@) = e*"*h(e’). An example will be given in 
Sec. 9, and it should be noticed that the method gives an exact formula (from 
which an asymptotic formula can sometimes be deduced). For continuous ran- 
dom variables the analogous process would be the use of Poisson’s summation 
formula: see, for example, Krishnan [17]. 

When the generating function is an infinite power series we can make similar 
use of (5.1) provided that the series on the right is utterly dominated by its 
largest term. A potential example is given in Sec. 10. 


6. The general asymptotic formulae. 

6.0. In this section we give theorems in a convenient form for applications 
to probability and statistics, concerned with the asymptotic expension of mul- 
tiple integrals containing a large parameter. Three terms are given for single in- 
tegrals, two for double integrals, and one for multiple integrals. Only the first 
two theorems are applied in the present paper, but the multidimensional theorem 
seems worth stating since it puts the two-dimensional one into proper perspective. 

6.1. The asymptotic expansion of a single integral. 
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THEOREM 6.1. Let 


: 
f(z) = > a2” 
n= 2 

be a power-series or polynomial with non-negative real coefficients and a non- 
vanishing open domain of convergence (an annulus, or the inside or outside of 
a circle). Suppose that the suffixes r at which a, > 0 do not all have a common 
factor greater than 1. (This condition is given, though incidentally, by Daniels 
[2], p. 646. It is clearly necessary and its sufficiency will follow from the remarks 
following Theorem 6.3. The condition can always be forced by a change in the 
variable of the generating function.) Let the coefficient of z* in (f(z))‘ be e(N, f). 
If c(N, t) ¥ 0, then there is a unique non-negative real solution, p, of the equa- 
tion 


(6.1) tp < fis) = Nfl); 
dp’ 
and, if in addition N//t is held inside a constant interval, then 
[f(o)|* _ 
op V/ 2x 
(6.2) ( 


1 aan inna 
. + a (3A, — 5A3) + (1/1152¢°) 


c(N, t) ~ 


- (168A;As + 385A$ — G3OASA, — 245 + 105A4) + ---} 


uniformly as t — «, where 

(6.3) A, = A.(p) = «-(p)/o° 
where 

(6.4) 


and 


\ 


(6.5) “ite * (=) (log f(oe")) luo , 
ou 


If we write p = e*, we may replace (6.5) by 


66) ce) = (2) toe fe lenin. 
OF 
Similarly if f(z) = J*. a(r)z’ dr, where 
(a) a(r) 2 O for all real r, and is continuous, 
(b) the integral is convergent for some non-vanishing open interval of posi- 
tive values of z, then 


x 


(f(z)! = c(N, tz" aN, 


7p 
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where c(N, 2) is still given by (6.2) for positive [negative] VN, and where the re- 
maining conclusions are formally the same as in the discrete case. The conclu- 
sions of the theorem (both in the discrete and continuous cases) may be simul- 
taneously true for positive and negative N. 

The above theorem can be proved as in Daniels [2] and differs from Daniels’s 
form only in (i) that we do not insist on the condition f(1) = 1, and (ii) that 
we have calculated the third term of the asymptotic series. More terms could 
be worked out on an electronic computer programmed to do algebra. 

Since the theorem is so similar to the form given by Daniels we shall here 
content ourselves with some of the formal details leading to the extra term. We 
may suppose without real loss of generality that f(p) = 1. Write «,(p) = «, for 
short. Then 


c(N,t) = . $ (fle) dz 
2mJ * 


] a e 16» —Ni 
- [ (f(pe"*))‘e*" do 


pln 


] oT f . . 36 \ 
- | exp {4 —4x:.6° — + ---> dé 


pln 6 

1 Lt gee ae K ¢ Ke + 
nS ra an - = eee 
ph 2ae/t d-syi . p( 241 7200? 


Vv - 
kK 
- COS 


r = 1.3.5.---(2n + 1). 


and (6.2) follows from the formula 


V 2s 


(Some of the above algebra can be done with the help of Kendall [16], I, formula 
(3.30), with his x. = 0, since this formula gives the expansion of the exponential 
of a power series.) As a check of the above theorem consider the case a, = 
e *'‘(N/t)'/r!. Then we find that the theorem correctly gives 


l en ] ] 
1 ital dale ZY 
Ni N'\2e 12N 288N 
A further check of the theorem can be obtained by applying it to a classical 
problem in the theory of numbers, namely the enumeration of ways of express- 
ing a positive integer N as the sum of ¢ squares of numbers 0, +1, +2, --- (dif- 
ferent orders counting as different representations). For a detailed discussion 
of this problem see Hardy [9], Chapter IX. Here c(N, 4) = ri(N) in Hardy’s 
notation, and 


7 r(N)a* = (8(zx))‘, 


N=( 
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where 


ow 


Ka) = Da” (l\a| <)). 


n=—o 


The equation for p becomes 


(6.7) ee 


Now 


ae T TT —ntet/log! 
fe) = Le a or e 


~ of 0-. if p is near 1. 
log 1/p 


If p is near to 1 we may approximate the derivatives of log f(p) by those of 


I 
—4 log log -. 
p 


We find that 
t 


IA’ 


I 
log - 
p 2! 


if N/t is large, that 
o &- /2, 
and that 


(r — 1)! /2NY 5 
Kp) & —>- (= ) (r = 1,2,--- 


As & 2 V/2, 


and Theorem 6.1 gives 


- ] / 2nNe\"' l l 
rN) & oF 4 A t ) (} Gt * 72# * ). 


Clearly what we have here is no! ter than the elementary result 

i arht— 

r(N) ~~ as No, 

(31) 
though once again the example acts as a check of Theorem 6.1. It would be in- 
teresting to see what the theorem would give if N/t were not assumed to be 
large. The equation (6.7) could be solved iteratively and the result of the theo- 
rem could be compared with known results; for example, the case t = 24 is 
treated in detail by Hardy [9]. 
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On the whole is seems unlikely that the present method could compete in 
elegance with the classical theory in which, instead of the saddle-point method, 
a so-called “singular series’ is obtained by using a contour that comes very 
close to the unit circle. It may be that there is scope for the method of singu- 
lar series in statistical problems. 

6.2. The asymptotic expansion of a double integral. (Compare Hsu [11]; Copson 
[1], See. 5. These authors do not work out the second term of the expansion.) 

THroreM 6.2. Let 


fzy=>d>> >Y alr,szry 
r=m—O jm—o 
(or fe0 J22a(r, s)z"y’ dx dy, where a(r, s) is continuous), where the conditions 
(6.14), (6.15), and (6.16) of the (multidimensional) Theorem 6.3 are satisfied with 
| = 2. Let the coefficient of x“y™ in (f(x, y))‘ be c(M, N, 2), v.e., 


(f(a, y))* = - > c(M, N, t) 2™ »* 


M=—2 N=—a 


(or (f(x, y))! [ / c(M, N, t) 2“ y" dM dN), 


where the conditions (6.17) to (6.19) of Theorem 6.3 are satisfied with | = 2. Then 
there is at most one pair of non-negative real numbers p, p’ such that 


(6.8) tp , flo, p’) = Mf(o,p’), 


0 4 y ’ 
—; fle, p’) = Nf(p, p’), 


(6.9) tp’ ap 


If (p, p’) ts a boundary point of the domain of convergence, then f and its deriva- 
tives may be interpreted as limits from within the domain. A similar understanding 
applies throughout this paper.) 
and if there is such a pair then 

ry\jt ; 
(M,N. ~ [f(p, ’))" 


l 
{1 + — | 
Qatpp’* V/A | 24 


12r13 @ + Bro — SA5o — DAR(1 + 40”) — 9rR(1 + 40°) 


Bru — 12dAn @ + 6GAree(1 + 2a’) 


5s + 30X30 Aa SS = OAz0 Aw(I > 4a’) a 2Qre0 Aos al(3 + 2a’) 
18X91 Are a(3 + 2a") — Gro Nos + 4a”) 


30A12 Aor a| + yee > ’ 


ir 
Kre Ko2 K2¢ ; Au 


»; icaey ? 7 = ’ 
A’ , Vv (Xoo eo) 
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where 
(6.12) 
where 


a y 0 , — 
> 46 rs — Kre . . = a = lo (e*, ) ° 
(6.13) ” Kr ales B') (2 (2) Ste, t=logp 


is 


y= loge’ 


9 
—= K20Ko2 — Kil, 


Here again we merely give some of the formal details of the proof. These 
details should suffice, when combined with the references cited, together with 
the remarks following Theorem 6.3. 

There is no real loss of generality in assuming that f(p, p’) = 1. Then 


: 1 a ee 2 1 ‘ 
c(M,N,t) = oad [ expt, — 5 en 6? — xnOe — Snag? 


peas LL, a 2 


i 3 
ror ae Ko 8 a 
6 


parvft nite. 4% -2 ] / 
ol. eo 10200? +201 10¢-+402¢ exp <1 + — (ka 6 
OM! a2 ga 241 


+ AKs Op + Gio Oy” + dns O° + Kos’) + - 


\ 


te | 
COS { = (Kao 0 + 3xn Oe + Bxi20¢ + korg) + ---\dO dy. 
6t 


If we now define ¢; , ¢2 and jp by the equations 


_ Ko2 (20 us — Kil 


oS" S44 ee 

where A = keke — «j1, We may apply the formulae for the moments of a bi- 
variate normal distribution as given, for example, by Kendall [16], Sec. 3.29 and 
Exercise 3.15, and we formally obtain the result (6.10). In order to check the 
algebra it may be observed that, when a = 1, the sum of the moduli of the 
first five coefficients of the \’s is 48 and the sum of the rest is 320, while the 
algebraic sums are both zero. These facts can be inferred from the above argu- 
ment without going through all the details. 

An application of Theorem 6.2 is given in Sec. 8. In this application it is 
found that the coefficient of 1/(24t) is very ill-conditioned. It may be possible 
to write it in a well-conditioned form. 

6.3. The asymptotic expansion of a multiple integral. (Compare Hsu [12], 
Rooney [22].) 

THEOREM 6.3. Let 


f(x ; x2 “hehe. x1) = 
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where a(r:,--- , T:) ts continuous), where the summation is over all integral lat- 
lice-points in l-dimensional space, and where 


(6.14) a(ri,f2,°°*,7:) 2 0; 


(6.15) the series (integral) is convergent in some non-vanishing open l|-dimen- 
sional domain; 

(6.16) there exist positive integers Ri, Ro ,--- , Rio such that every point R, , 
R.,---,R, with Rie > Rw, Ree > Ro, --- , Ree > Rw , where each ¢ is either 
+1 or —1, can be expressed as a linear combination of points (r;, T2, +++ , T1) 
for which a(ri, r2, +++ , 71) > 0, the coefficients in these linear combinations being 
positive integers. In other words the suffixes corresponding to positive coefficients in 
f span all points sufficiently far from the origin in at least one ‘‘octant’’ (or 2'-ant). 
(For better understanding the reader may take €; = 1 for all 7.) There may be as 
many as 2' octants for which this condition is valid. (In the “continuous” case of 
the theorem this condition does not require explicit mention.) 

27: ‘in (f(a, t2, °°: , 22)‘ be 


Let the coefficient of zff'x? 
c(M, ’ M.z, 2 |, M, : Fj, 


M 


x 


D> cMi,---,Mipzi'--- zt! 


M [=—x 


ae 


| | c(M,, --- , Mixx" jet x} ' dM, --- dM)). 


Suppose further that we restrict our attention to a class of values of 
(M, ‘ M:, roe M);) 
for which 
(6.17) M,/t(Gj = 1, 2,---, D all lie in fixed, not necessarily finite, intervals; 


(6.18) sgn M; = ¢; (j = 1,2,---,2); 


(6.19) c(M,, M:,---, M,;t) # 0 at any point satisfying the above condi- 
tions, if ¢ is large enough. 
Then there is at most one ordered set of J non-negative real numbers p; , pz , 
- | p, such that 


, 0 . ; : , 

(6.20) toi 5, Slo, --+ pu) = M;f(m,--- , px) (Gj = 1,2,--- ,D, 

and if there is such a set of | numbers, then uniformly 

(6.21) e(M:, Mz, --- Mit) ~ _Lflox pay --- spl =; 
(2xt)"pi'---pi' WA 


where 


(6.22) A = det {xiji(m,---,p)}, 
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where 


: 0 0 
(6.23) Mor, = 0) = ps2 (o, © Ne Mi, +> i) 
Pj Op 


even if 7 = k. Or we can write 


2 


0 - Piatt stk 
dE; Ob log fle ° +€ ) 


k 
(6.24) Kila, *** 5 py) = 
&j;=logp ‘ 
QG=1,---,L) 


In other words A is the Hessian of log f(e", --- , e*) at 


(&,--- ,&) = (logp, --- , log p:). 
Notice that 


D is = fle®, cee, e*)e (My Ey tes +M Ep) /t 


, 


which is essentially of the form of a Laplace transform of a non-negative func- 
tion, is an analytic convex function of (& , f& , --- , &:) in its real domain of con- 
vergence, and so also is log f*. (Cf. Doetsch [2A], p. 58.) It can be seen to be 
strictly convex if the points (“basis vectors”) at which a(r;, r2,°--, 71) > O 
span a genuinely /-dimensional space, which they do in virtue of condition 
(6.16). (It is also strictly convex in any linear manifold that belongs to the 
boundary of the domain of convergence.) Hence the Hessian of log f* (which is 
equal to A) is strictly positive at points of the real domain of convergence of f, 
and stationary points of f*, even if they are on the boundary of the domain, 
are necessarily minimum points. There cannot be more than one stationary 
point. Certainly /* attains its minimum but this may be on the boundary of 
the domain. It follows that the solution of (6.20) is unique if it exists, and it 
will exist if f* attains its minimum at an interior point. It may also exist if the 
minimum is on the boundary, provided that the minimum is a stationary point 
as in an example considered in Sec. 8 in relation to the distribution of x°. In 
any practical problem if the equations (6.20) can be solved there are no further 
difficulties. 

It can be seen that condition (6.16) is equivalent to the statement that every 
point in the /-dimensional lattice can be expressed linearly, with integral co- 
efficients (not necessarily positive), in terms of the ‘‘basis vectors”’ 


-, 11) 


for which a(r; , 72, --: , 7.) > O. If this condition were not satisfied, then at 
least one of the vectors (1, 0,0, --- , 0), (0, 1,0, --- , 0), (0,0, 1, --- , 0), ete., 
would not be “spanned,” say (1, 0, 0, --- , 0) for definiteness. In this case there 
would be a smallest positive integer ro such that all multiples of (ro, 0, 0, --- , 0) 
would be spanned and no other points of the form (r, 0, 0, --- , 0). Then there 
would be at least ro values of 6; , not congruent modulo 27, at which 


° 


Kore, pre'"?, neg pre’**) 
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would be of maximum modulus, i.e., at least 7) equally important saddlepoints, 
and the asymptotic formula (6.21) would need modification. 

The remaining details of rigour in the proof of Theorem 6.3 may be supplied 
along the lines of Daniels [2], whose use of Lagrange’s expansion must be now 
replaced by the multidimensional form that is given in Sec. 104 of Goursat [6] 
(and attributed to Laplace). 

If in Theorem 6.3 the power f‘ were replaced by the product of ¢ distinct 
power series, and if moreover the second term in the asymptotic expansion were 
obtained then we should have a theorem that could be used in conjunction with 
(4.7) for approximation to the distribution of x’ for contingency tables. If only 
the first term of the asymptotic expansion were available we should merely 
arrive at the familiar gamma-variate approximation. 


7. Asymptotic expansion of P)(m{|N, t). We pointed out in Sec. 4.1 that 
Po(m | N, t) is equal to the coefficient of x” in 


N! x z”\* 
W(ite+h+ ae +5) 


2! 


We may therefore make use of Theorem 6.1. with 


m 


2 
fe) =1t2etR4¢-- 4+. 


2! m! 
Equation (6.1) becomes 
m 


(m—-1)! ft m! 


2 r 2 m 
(7.1) etter + Pp =V(1+e+h4-- 42), 


which, when N and ¢ are numerically assigned, can be solved by any method 
for the numerical solution of algebraic equations. It is then a straightforward 
calculation to apply Theorem 6.1, and it could be done on a general-purpose 
computer for any specified values of t, N, and m. 

As a detailed example take m = 2 and N = ¢t. Then 


p= V2 


and it can be shown that 
log f(pe") 


log f(o) + u + log l + (4 — 2+/2) sinh’ |. 
Hence 
= Ks ae «ce 0, 
a @ 1, a= ' 2 = -16+ 11/2, = xe = -512 — 3614/2, 
Mo = 3(-44+ V2), =e = -3.(33 — 13/2), 
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TABLE 1 


One term Two terms 
.070 
0.990 

0.8914 0.8904 0.8888 
0.79703 0.79652 0.7968750 
0.71326 0.7104105 0.71040 
0.39585 0.3958103 0.395811360 


ark whoe 


> 


8 — 3/2 96/2 —- 113 


18: moe +f 





In Table 1 the results of taking one, two and three terms of (7.2) are given in 
the second, third, and fourth columns, while the last column gives the exact 
value of Po(2 | t, ¢). It seems fair to say that, when using formula (7.2), we may 
regard 4 as a large number and 10 as very large indeed. 

When N is not necessarily equal to /, the first term of the asymptotic formula 
is 


: r\ N+} 9 +. /2 , (t a N)? 
(73) P,(2|N,t) ~ (=) a 


a ——> eOXD ( — 7 a fe} 
t (evV/2)* V2 — v2 : 2(2 — ¥2)t 
For ¢ = 8 we have the following numerical values. 
2 4 5 3 { i2 4 
1.4 988 dda 506 -074 O14 
1.0 .943 76! .f .237 .070 .010 


where row (i) is obtained from (7.3) and row (ii) is the value of Po(2! N, f) 
correct to three decimal places, computed directly from the generating function. 


8. Asymptotic expansion for the distribution of chi-squared. It was pointed 
out in Sec. 4.3 that for the equiprobable multinomial distribution, 


2 ; , — 
(8.1) x = — N, where S = 7 Re, 
4 r=1 


and that 
(8.2) Pr(S = M) = coefficient of x“y* in NIE (f)', 


where 
x ry" 
f(z, y) _ ea 


ans 1 
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The equations for p and p’, in Theorem 6.2, are 
1 ’ 


x 2 n2 ” 
(8.3) 2 iio 
o n! 


” 


ow n2 
(8.4) > ak. = 


0 n! 


When M is a possible value for S these two equations have a unique solution, 
and this solution could be obtained by means of an iterative process on 
a general-purpose computer. (Each equation can be shown to determine p 
uniquely given p’ and conversely.) 

As an example that can be worked out by hand calculation, we consider the 
special case 


when 


(p = 1 is on the boundary of convergence of f, but this does not affect the va- 
lidity of Theorem 6.2.) We have 


2: 
o “+n 


e n 0,1.2.--- B, tn" 
Z res adie 
~—— =e L ME", say, 
n= nm. rs r,s. 


2r+s 


x 
wf n 
=€é > = Dor+s ’ 


n= n! 


2z 
é 


z box” 
Ii+e T = > *** =e(b+r2+ 2% +...); 


b,-1 + (Pi = 1)b,-2 + ( Me 


') bas + eee — bo, 


be = 15, 

52, be = 203, b; = 877, bs = 4140. 
Write u,, for the product moments about the mean of the (artificial) distribu- 
tion with probability generating function e'f(z, y), and we find by using known 


relationships between bivariate moments, moments about means and cumulants 
(see, for example, Kendall [16] Sec. 3.29 and exercise 3.15, and Kendall [15}), 


= a ( ‘) (*) (=o) (—wor) “win 


2 


=e 2 7 (", - 1) (n — 1)’. 
n=O 1: 2 
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Hence 


u31 s 22 90, 
Keo = 11, a = Kee = 1; 
= 129, Ka = wan = 25, Kio = pe = 9, 
Buz = 2465, kat = wa — Spon = 389, 
Hoon — Quy = 61, Ki3 = wis — Shon = 9, 
3uoe = 1. 
A 
Noo = 
Ao = . 2 25/11) /(2v/2), 2 = 55/(27/2), 
s = (11/11) /(2V/2); 
Mw = 2465/4, An = (389+/11)/4, 
Ae = 671/4, Ass = (994/11)/4, hoe = 121/4 
Theorem 6.2 now shows that 


(8.5) Pr (x’ = t|N = 2) 1+i+--). 


‘ t= ( 
2—/ nt \ 


The coefficient of 1/(24¢) in Theorem 6.2 reduces to 4 in this example, al- 
though one of its terms is over 30,000. The mere fact that (8.5) looks sensible 
is therefore quite a good check of the arithmetic and algebra. It is important 
to remember for future applications that the coefficient of 1/¢ in Theorem 6.2 
is liable to be ill-conditioned, especially for machine programming. 

It seems likely that the application of the theorem to x* would give better 
results for its cumulative distribution than for the individual probabilities 
Pr [x° = (tM/N) — N). This opinion is supported by the earlier discussion of 
the lattice-point problem. When ¢ is small, say ¢ = 2, the first term, cw say, of 
the asymptotic formula for the lattice-point problem is misleading since no 
prime of the form 4n + 3 can be expressed as the sum of two squares; but 


C, + co + +--+ + ey give a good approximation to the number of lattice points 
in the circle 2 + 7 < N. 


9. Some exact formulae for Po(2 | N, ¢). In order to illustrate the ‘discrete 
method” of section 5.2 we now consider the probability Po(2 | N, ¢) in more 
detail. 


We have (with p = +/2), from (4.2), 


ry Fp 10 9 9 ,18 t 
P,(2| N,t) = N! ‘= af (2¢ t+ 2e + 2¢ ) dé 
«Tp -F 


Stn eN—t)id 





SADDLE-POINT METHODS 


We can apply (5.2) with 
u > l + N ete t 
and we obtain the following exact finite series. 


ren N1(2 + +/2)' 


NDI 


P,(2 | 1 


> "| ( 


{1 — (4 — 2v/2) sin’ =) 
/ 


2ar(t — N) 
— a - 


“\1+2 > cos 
r=] 


+ (—1)"(3 — 2v/2)‘e}, 


where 


u is odd 


u is even. 
It follows that 
P(2|2t—N,t) (2 — N)! 


(9.2) —~Pi21N.) = pu") 1QI? 


a formula that also follows directly from the fact that t*Po(2 | N, t)2” /N! is 
the coefficient of 2” in (1 + +/2z + 2)‘, which is equal to that of 2” by sym- 
metry. 

Some further combinatorial formulae, which we give here without proof, are 
@-—))! 
t— NN)!" 
= 1 


Nit! 
9. >(2|N,t) = ne . inant, 
(9.4) Pol 7 2 (N — 2s)\(t — N + s)!s!2° 


(93) PA|N,t) =; 





N 
P,(m| N, t) (1 _ ) (m|N,t — 1) 


N\ 1 ere ; 
Sa ro (; F 1 nd a P,(m|N — m,t — 1) 
m/ \t t . ; ey 


1 N-—m 
1 — 7) Pom — 1|N — m,t — Ll) 


t N! 1 2 N-—2m : 
cacinciaieneaitittnasliniaatinaa la tiila le i iainaaiisieseiee, \ jill itil cee ite , oboe i, 
+ (5) mim\N — 2m)! @ (: *) Pim = 1|N — am, — 2) 


+(S)am ee (: 3 
3/ mim!im\(N — 3r)! 6 
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10. Combinatorial formulae for chi-squared. In most statistical work the 
interesting values of x? (or equivalently of S) are those greater than the expec- 
tation given the null hypothesis. If we are interested in Pr(S = M), where M 
is greater than the expectation, then Pr(S = 2M) + Pr(S = 3M) + --- will 
be negligible and we get, from (5.2), to an adequate approximation (writing 
w = exp (277/M)), 


Pr (S = M) = Pr (S = 0 (mod M)) 


' 


~ iy-M 


times the coefficient of y° in 


M—l1 1 M—1 M—1 ae 7 
eX + D E ao uu oe | 


m=l1 r=( 


if M is odd. The expression 5°, w”” is the Gaussian sum and is equal to 


(m/M)/M 


or i(m/M)+/M according as M = 1 or 3 (mod 4), where (m/M) is Legendre’s 
symbol. The question arises whether the methods of Vinogradov [24] could be 
applied to the problem of the distribution of x’. 
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SOME PROBLEMS OF STOCHASTIC PROCESSES IN GENETICS' 
Moroo Kimvura* 
Department of Genetics, University of Wisconsin 


Summary. In genetics, stochastic processes arise at all levels of organization 
ranging from subunits of the gene to natural populations. Types of stochastic 
processes involved are also diverse. In the present paper, the following five 
topics have been selected for mathematical discussion and new results are pre- 
sented: (1) Random assortment of subunits of a gene. (2) Senescence in para- 
mecium due to random assortment of chromosomes. (3) Process of natural 
selection in a finite population (interaction between selection and random ge- 
netic drift). (4) Chance of fixation of mutant genes. (5) Population structure and 
evolution. Finally it is pointed out that new mathematical techniques will be 
needed for a satisfactory treatment of Wright’s theory of evolution. 


‘“‘Klles n’auroient di leur premiere origine qu’A quelques productions fortuites, 
dans lesquelles les parties élémentaires n’auroient pas retenu l’ordre qu’elles 
tenoient dans les animaux peres & meres: chaque degré d’erreur auroit fait une 
nouvelle espece: . . . 


Des moyens différents des moyens ordinaires que la Nature emploie pour la 
production des animaux, loin d’étre des objections contre ce systéme, lui sont 
indifférents, ou lui seroient plutét favorables.’’* 


Maupertuis (Oeuvres, 1756) 


1. Introduction. These words, written two centuries ago, foreshadow the sto- 
chastic nature of genetic and evolutionary processes. Actually, stochastic pro- 
cesses are found in all levels of organization with which genetics is concerned, 
in the gene, the cell, the organism, and the population. 

The types of stochastic processes involved are also diverse. Of special impor- 
tance is the Markov process, which Kolmogorov [1] called stochastic definite; 
the exact treatment of regular systems of inbreeding is a typical example of a 


Received March 26, 1956; revised May 14, 1957. 
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* “They [i.e. species] would have owed their first origin only to certain fortuitous produc- 
tions, in which the elementary particles have not retained the arrangement that they had 
in the father and mother animals: each grade of error would have made a new species. . . . 

‘“‘As to methods [e.g. parthenogenesis, fission] different from the ordinary methods that 
Nature employes for the production of animals, far from being objections to this theory, 
they are indifferent to it, or rather would be favorable to it.’’ 
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finite Markov chain, though workers in this field have seldom used such a ter- 
minology [2] [3]. The fate of an individual mutant gene appearing in a popula- 
tion may best be studied by the theory of branching processes. The probability 
distribution of gene frequencies in natural populations is important in the 
mathematical theory of evolution developed by Fisher and Wright. It contains 
many difficult problems of continuous stochastic processes in which the Kol- 
mogorov equation plays a fundamental role [4] [5] [6] [7]. 

In the present paper a few topics will be selected from various levels of or- 
ganization ranging from subunits of the gene to natural populations. 


2. Random assortment of subunits in chromosome division. The idea that 
each gene is composed of a number of subunits is a natural one, since analogous 
situations are familiar in physics and chemistry. If the subunits are of two or 
more different kinds, they will be sorted out in the process of chromosome 
division 

In fact, such a model was proposed more than three decades ago to explain 
the high mutability of the so-called mutable genes. Unfortunately, precise ex- 
perimental results in a few higher organisms apparently contradicted this 
model [8]. Later, however, as investigations of the finer structure of the chromo- 
some have developed, a multiple-strand structure has been revealed and this 
encouraged the formulation of the same type of model again. Matsuura and 
Suto [9], in order to explain certain irregular segregations in maize, assumed 
that each chromosome contains 8 strands and that mutation may affect any 
one of the 8 gene replicates. A similar model was used by Auerbach [10] to ex- 
plain the occurrence of mosaics in the offspring of Drosophila males treated 
with mustard gas. More recently, Friedrich-Freksa and Kaudewitz [11] carried 
out an interesting experiment with Amoeba proteus treated with radioactive 
P®, in which they assumed that sorting-out of the radiation-damaged strands 
or subunits causes death to the organism in later generations. 

Let us consider a model in which each chromosome consists of n subunits 
and suppose that a mutation has occurred in one of them. The subunits dupli- 
cate to produce 2n which separate at random into two groups of n subunits to 
form the daughter chromosomes. Thus the total rumber of subunits per chromo- 
some is kept constant, but the number of the mutant subunits may change 
from generation to generation due to random segregation. We follow a single 
line of descent obtained by selecting randomly one of the pair of daughter 
chromosomes in each generation. We shall designate by E,(i = 0, 1, --- , n) 
the state in which a given chromosome contains exactly 7 mutant subunits. Let 
a; be the, probability that the chromosome is in the state EZ; at the ¢th genera- 
tion (assuming that the mutation occurred at / = 0). In the present model, the 
transition probabilities p;;; = Pr{E;<— E;}(i,7 = 0, 1, 2, --- , nm) are given by 


as a (**) (> - r) / “ 
wns ™ j n—j n}° 
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Thus the probability chain will be expressed by 
(2.2) a” = Pa”, 


in which a‘” is a column vector whose ith element is aj” and P is ann X n 
matrix whose element in the ith row, jth column is p;;;. Obviously £5 and E£,, 
are absorbing barriers and the remaining states EZ, , --- , E,-; are all transient. 
Eigenvalues of the matrix P, which satisfy | P — AJ | = 0, are 


») “a 9 
(2.3) 1 = 2 ")/C) (r = 0,1, ---,n). 
ma--— Tf n 


This can be shown by following a procedure similar to that which Feller [5] 
gave in an appendix of his paper (p. 244), noting that a non-trivial set of y,;’s 
which satisfy 


~ Piii¥i = Ar Yi 


j=0 
can be written in the form 


yj; = > cj” (ce, is a constant) 
v0 


38 (») (2i)” 2n — " 2n 
. — 9), a= -_ 
| =: (25 
sao Pilid re —yp n}?’ 


where j” = j(j — 1) --- (j — » + 1) is the factorial of degree v. 

Though the general expressions for eigenvectors of P and its transpose P’ 
corresponding to these eigenvalues do not seem easily obtainable, we can ob- 
tain numerical results for small values of n. Thus we can construct formulae 
giving probabilities of various states at a given generation. 

I have worked out the cases of n = 2, 3, 4, 5, 6, 8, and 16, the details of which 
will be published elsewhere. 

We are particularly interested in the probability (d‘”) with which a mutant 
chromosome (i.e., a chromosome containing only mutant subunits) first appears 
by the sorting out process in a given generation /(t = 1, 2, 3, --- ). This is ob- 
tained as dS” = aS” — aS‘ for the case of n subunits. For n = 2, 4, 8 and 
16, we have 


l ys 
(t) ts 
ds 6 ( , 


“We arr \"" ay ‘ a 
« <mert tig) = Oto a5 { — 
asa 199 (£ 330 = + 135 (7 >, 


\ ) 


and 


c 


1 fap e 9 
= ._< (949 ont sais 
12870 (242.0) B (709.1) (= 


\ 


mia ai so \~ ) 
+ (904.5) = — (686.9) (=) de sdch. 
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TABLE 1 


max dinax (7%) 


16.667 
2.287 


0.481 
0.111 


1 


i t—1 am (28 t-1 aol 728\'" 
dis = 0.0051626 (=) — 0.0193172 3) ~ 0.035843 1 (=) 


i 624\"" iui 
— 0.0457009 (=) + 0.0451643 (Fe) + +-°. 

If the mutant chromosome causes death to the Ameba as in the model of 
Friedrich-Freksa ef al. [11], d‘° gives the probability of a death at the ¢th gener- 
ation. 

It may be expected on intuitive grounds that the larger the number of sub- 
units, the later will be the appearance of the mutant chromosome. One way of 
expressing this tendency is to calculate the value of ¢ which maximizes d{’. 
Table 1 shows these values and corresponding values of d. 

Next we shall consider the situation in which n is very large. The proportion 
of mutant subunits x = i/n (0 S x S 1) may be treated as a continuous vari- 
able with good approximation. Let ¢(z, ¢) be the probability density of z at 
time ¢ measured in generations. If dr is the amount of change in x per generation, 


E(éx) = 0, E(éx)’ = x(1 — x)/(2n — 1) and, for k = 3, 


E(éx) is o(1/n). Therefore we can use the following differential equation to 
obtain ¢(x, t) (see [16]): 


(2.4) dg{z, t) = SO fr(1 — x)¢d(z, t)} O<zx<)1), 


ot 2(2n — 1) dx” 


with the initial condition 


(2.5) o(z,0) = 3 (2 ~ =k 


5 n 


where 6 is the Dirac function. The singular equation (2.4) is equivalent to the 
one describing the process of random genetic drift in natural populations if we 
put 2n — 1 = 2N, N being the effective population size. The complete solution 
of this equation has been worked out (see Section 4). The points x = Oandz =1 
act as absorbing barriers and the rate of fixation at r = 1 is given by $(1, f)/4N 
which reduces to 

(t) l< 


da? = Y (-1) 4G + i + e™, 
4Nn i= 
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The value of ¢ giving d;’? maximum is obtained by solving 
l— 15e" + Ste” — 3000” + 825e" — --- = 0, 


where r = ¢/2N. The required root of this transcendental equation can be ob- 
tained numerically; r = 1.2940--- . 

From this we can derive the following remarkable asymptotic relations 
(n — ): 


| bmx _ 2.59n, 
(2.6) 


dinax ™ 1.08/(4n"). 


Namely, fmax Will be proportional to the number of subunits and d,,,. will be 
inversely proportional to the square of that number. 

Finally it is desirable to consider the cases in which the initial number of mu- 
tant subunits is more than one, that is, some number, say np, where p is the ini- 
tial proportion of the subunits (0 < p < 1). The limiting probability of absorp- 
tion at n, starting at 7, is the ith component of an eigenvector corresponding to 
\ = 1, and from the third formula after equation (2.3) it follows that p = i/n 
is the required probability. The total frequency of mutant chromosomes in the 
tth generation will be expressed in the form: 


t n 
2.7) fa’ () = 2» dS = p+ Dd Cp, n)ri, 
where the C;(p, )’s are functions of p and n but not of ¢. If the mutant chromo- 
some changes the fitness of its possesser, this type of formula must be applied 
with caution. Generally d°” should be used as a basis of comparing expectation 
with observed results. 

For very large n, (2.7) should approach 


(2.8) pt } i (2i + 1)pq(—1)‘F(O — 7, i + 2,2, pe Orr 
i=] 


where g = | — p and F designates the hypergeometric function (see (5.3) in 
Section 5). 

The experimental data of Friedrich-Freksa {11] appear to agree with the 
model for n = 16. 


3. Senescence in Paramecium. It has been known to biologists for a long 
time that if cultures of the protozoon, Paramecium, are kept under exclusive 
asexual reproduction, they lose vigor and eventually die. This phenomenon is 
known as senescence or aging of paramecium and in fact is one of the old prob- 
lems in biology. Recently Dr. T. M. Sonneborn has made extensive studies of 
this phenomenon and discussed a hypothesis that the aging is due to an ac- 
cumulation of chromosome aberrations in the macronucleus (ef. e.g. Sonneborn 
and Schneller [12]). Following the suggestion of Dr. J. Lederberg, I have tried 
to work out the logical consequences of the stochastic model involved 
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The macronucleus is considered to be polyploid, consisting of, say, m chromo- 
some sets each with n chromosomes. As in a polyploid nucleus of higher plants, 
the various chromosomes are mixed at random inside the nucleus‘. If we desig- 
nate the chromosomes of a set by successive letters A, B,C, --- , N and desig- 
nate sets by subscripts, then the normal constituent of the nucleus will be ex- 
pressed in the form: 


it 


_. & 
B, Cy; 


Ais: Ba Ca -« Be 
We designate the total number of chromosomes by M(= mn).” 

On this model we assume that at the division of the macronucleus, each 
chromosome duplicates itself followed by the random distribution of chromo- 
somes into two groups of equal number to form the daughter macronuclei. 
The death by aging is assumed to occur whenever chromosomes of any 
one type are lost entirely from the nucleus by chance. Various states of the 
nucleus will be expressed as n dimensional vectors. Here we have a hierarchical 
structure of absorbing barriers, and a direct attack on the problem may seem 
extremely difficult. However, because of the symmetry of the model, we can find 
an easier approach. Suppose we start from an individual with normal macro- 
nucleus, and each generation takes one of the daughters to continue the lineage. 
Our purpose is to calculate the probability that all the n chromosome types 
coexist in the individual at the ¢th generation. Since the process of loss of any 
type of chromosome is irreversible, we can treat the problem as if all possible 
chromosome constituents are viable and then remove unsuitable parts after- 
wards. 

Let us fix our attention on the tth generation. We designate by P; the probabil- 
ity that all the chromosomes except those of one specific type have been lost by 
that time, by P: the probability that all but 2 specific types have been lost and 
that these two coexist. Generally P; will be defined in a similar way. Since we 
can classify n chromosome types into two alternative groups like A vs. non A, 
A or B vs. neither A nor B, ete., 


$1 - P, ; 


o3 = (*) P, + (3) P. + (3) P3, ete., 


* This model is essentially different from the one considered by Kimball and Householder 
13] to explain the delayed lethal effect of radiation. 

5 n in this section has a different meaning from that of the previous section. Generally 
the same symbol in different sections may not have the same meaning. 
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n 1 . an ~~ 1). 
tia ("7 l ‘r+ ("5 2 ') Ps +: (71) Pas, 


where ¢; represents f\y’(i/n) in (2.7) (i = 1,---, n — 1). For example, ¢ 
is the probability that all the chromosomes except A or B or both have been lost 
and this is a sum of the probabilities that all but A have been lost (P;), that all 
but B have been lost (P;), and that A and B coexist but all others have been 
lost (P:). It is convenient to consider the above relations as a linear transforma- 
tion of "Pie into ¢;’s with an (n — 1) X (nm — 1) matrix whose element in the 


et ; , 
ith row and jth column is (;) . The inverse transformation can be shown to have a 


matrix whose element in the ith row and jth column is (—1)**’ (‘) . Let 2. = P,) 
be the probability that all the chromosome types coexist in the macronucleus at 


: a n 
the ‘th generation. Since Dee ") P;=1 


n—l1 n—l1 
4 =1-E (")P. =1-F 0" (") 6, 
= j= 


Substituting ¢; = fie(j/n) from (2.7) and noting that 


> (-0 (") Gym) = 


j=l 


n—l M . 
AG cena se 


j=l i=? 


ya ot (2M nays ) 
DR M-i M }° 


According to Sonneborn, the usual strains of Paramecium have a chromo- 
some number of n = 41, but also there are strains with n = 35 and 50. There 
is a good reason to believe that the macronucleus is at least 100 ploid (m == 100). 
Thus the total number of chromosomes M in the macronucleus would be of the 
order of 5000. This fact will enable us to use the asymptotic formula for fy’ (j/n) 
given in (2.8). 


3 ~ teat (3) y (2i + 1)(/n)(1 — j/n)(—1)' 


xX FU — i,7 + 2, 2, j/n)e tne 


we have 


where 


or if we put 


=: > (—1)’ (“\uayma — j/n)F(l — i, i + 2, 2, j/n) 


j=l 
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(3.1) 2, = >> (26 + 1)(—1)""*a, ra, 
=1 


It can be shown that a; = 0 fori <n-—1.Fori2n-—1 


(2n — 2)! 


ey, a Se 


a1 => — 


and in general, writing i = (n — 1) + »(v 2 0), 


-(n-l1!< ‘(ieee wor 
aa yes 1) pt+tn—-2 “GEpae lw Sette 


n—l 


where S;+,-: is Stirling’s number of the second kind defined by 


-e” 


v=] 


An-l+r = 


(see [14]). Examination of the absolute values of an,:4, at vy = 1, 2, --- (small 
values of v) suggests that they are at most of the order of 1/n relative to that of 
@,-1 . This enables us to write down the following asymptotic formula for small 


Q, : 
2n — 1)! i)" —(n—1) t/4m 
s ~ = > : ( — o>): 
2 (n — 1)! (| e f © } 


or applying Stirling’s formula for n!, 


1 n—1 
3.2’ Q, ~ —= ex 2 2— — {|— : 
(3.2’) 2 A exp log 1)n ( _ ) 


Formula (3.2) can also be derived by a different method, by using the multi- 
variate Kolmogorov forward equation. To reach a given small probability of 
survival (Q), the approximate number of generations required will be given by 


i=" (039n — 035 — 2.3 logio ¢ Q). 


In the case of m = 100,n = 41, we have? = 156.4 — 23 logw and the gener- 
ations giving 99%, 99.9% and 99.99% deaths are respectively about 202, 225, 
and 248 generations. This agrees reasonably well with the finding of Sonneborn 
that under exclusively asexual reproduction, many of the lines die before 200 
fissions and almost all die before 324. 

\ slightly modified model was suggested to the author by Dr. J. Lederberg: 
After chromosomes have reduplicated themselves in the macronucleus, they are 
distributed into two daughter nuclei in such a way that each chromosome has 
an independent and equal chance of going to either daughter. This differs from 
the previous model in that the total number of chromosomes per cell does not 
remain constant. This leads to the following asymptotic formula for the probabil- 
ity of survival: 


Q; yap (1 en em eam)= (t ee 2), 
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For n = 41, m = 100, the number of generations for 99 %, 99.9% and 99.99% 
deaths are respectively about 175, 210, 246, rather similar to the previous model. 

Also these models allow predictions for time of death of a lineage derived from 
repeated regeneration from a small fraction of the macronucleus and for segrega- 
tion of recessive factors, thus permitting two additional independent tests of the 
models by comparison with data. 


4. Process of natural selection in finite population (interaction between se- 
lection and random genetic drift). From the standpoint of population genetics, 
the most elementary step in evolution is the change in gene frequency, especially 
the one due to natural selection. It may not be difficult to imagine that the 
process of change is not entirely deterministic, since there exist various factors 
which introduce an element of indeterminacy into the process, among which 
random sampling of gametes due to finite population size is of special interest. 
Let A and A’ be a pair of alleles whose frequencies are respectively « and 1 — x 
in the population. In natural populations, the number of individuals is usually 
large and there may be overlapping of generations, so that gene frequency and 
time parameter (/) may be treated as continuous variables with advantage. 
We shall designate by $(x, p; ¢) the probability density that the gene frequency 
lies between x and x + dz at the ‘th generation given that the initial gene fre- 
quency is p att = 0. 

The simplest situation is obviously that of pure random genetic drift in which 
no mutation, selection, or migration is involved. The gene frequency changes 
randomly from generation to generation due to random sampling of gametes in 
reproduction. In this case if N is the number of reproducing individuals in the 
random mating population, ¢ satisfies the following partial differential equation 
[15], [16] 
ag pag’. 


— —_ {r(1 — x)¢} 0 <2< 1), 


(4.1) — = 
ot AN ax? 


with the initial condition 


o(r, p; 0) = (x — p), 


where 6 represents Dirac’s delta function. Equation (4.1) is a special case of the 
Kolmogorov forward (or Fokker-Planck) equation, and its pertinent solution is 
given by 


(26+ 1) —r) wa - ~i(é-+1)t/4n 
(49 sans OY eae <d iene ze ; 
4.2) o(x, p; t) » Ga) Ti-1 (r)Ti-1 (ze 


‘ | ‘ e 7 
where r = 1 — 2p, z = 1 — 2a and T7_,(r) is the Gegenbauer Polynomial 
defined by 


TLi() = ST) Pp € +: 


9 
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Boundaries « = 0 and x = 1 act as absorbing barriers and as the number of 
generations increases, the probability distribution of the classes in which A and 
A’ coexist (“heterallelic,” or unfixed, classes) approaches a definite form and 
decays at the constant rate of 1/2N. The process ultimately leads to complete 
fixation or loss of one of the alleles. 

When linear pressures (mutation, migration) are involved, the problem 
becomes a little more complicated. A thoroughgoing analysis of the solutions 
of the differential equations in this case has been made by Goldberg [17]. The 
present author also obtained the pertinent solution® by studying the law of change 
in the moments of the distribution [6]. Malécot [4] [18] studied interesting 
problems of migration and decrease of correlation with distance in the case of 
no selection 

For the evolution of the genetic system, however, natural selection which 
acts on mutant genes will be of utmost importance. The simplest situation here 
is genic selection in which no dominance exists. Suppose gene A has selective 
advantage s over A’, measured in Malthusian parameters [19], that is to say, 
the rate of geometric growth. The partial differential equation now becomes 


ag big a 
(43 — = —._ {7(1 — z)g} — s— {x(1 — zo} (0 : 
43 at tN dx? ” o . or ‘ ad <2<D 


with the same initial condition as before. Recently this equation was used by 
Wright and Kerr [20] in connection with their selection experiment in very small 
populations. The state of steady decay of the heterallelic classes was successfully 
analysed by Wright. The complete solution of the above equation, which re- 
duces to that of pure random drift for s = 0, is given in terms of oblate spheroidal 
functions studied by Stratton and others [21]: 


d(x, p;t) = >» Ce VP (s), 
k= 


. 6 rr . . > 1) . 
where ¢ = Ns and 2 = | — 2x. Thespheroidal function V};’(z) is expressed asa 
series of Gegenbauer polynomials 


ViP(2) = dD fi Tie), 
n=0,1 
where f;’s are constants, and primed summation is over even values of n if k 
is even, odd values of n if k is odd. For details of the solution see [7]. The bound- 
aries z = 0 and x = 1 act as absorbing barriers as in the preceding cases and 
the gene A will ultimately be fixed in the population or completely lost from it 
The probability of fixation is larger, the larger the value of s. 


° Strictly speaking, the existing solution which treats boundaries as reflecting barriers 
is not entirely satisfactory, because for small populations boundaries should act as elastic 
harriers 
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At the state of steady decay, the probability distribution ¢ decreases in value 
at a constant rate Xo : 


. ldo 
moe 


= 0: 


For small values of c, we can expand Xo into a power series in ¢. 


bes, es - ta 2-31 5 

(4.4) 2NrA = 1+ 75 “BS — c—-: 

This suggests that genic selection increases the rate of decay as compared with 
the case of no selection (c = 0), at least when c is small. Values of Ao for larger 
values of c will be found in the above reference [7] in which values of 2NX» up 
to Ns = 8 have been studied. 

Very often, however, there is some dominance between alleles, and usually 
“complete” dominance. The main purpose of this section is to develop a method 
to analyse this situation. 

Let us suppose that A is dominant over A’ and the dominant genotypes AA 
and AA’ have selective advantage s, measured in Malthusian parameters, over 
the homozygous recessive (A’A’). The differential equation for the probability 
distribution ¢ is 


F 1 ae > 
ad _ otf 06) > feel. 2)'6) (0< 2 <1), 


4.5 — = — —— 
(4.5) al 4N dx? Or 


with the initial condition 


o(x, p, 0) = (x — p). 
If we apply the transformation 


—\t 2ex(1—z/2) 
@=e¢ ¢  w, 


x = (1 — 2z)/2, 


to (4.5), we obtain the following ordinary differential oqnaine: 


(4.6) (1 — 2)w” — 4ew’ + E —2 —£¢ ¢ — 1) +¢ (2? —1)(1 + 2)" |w = Q, 


in which A = 4NX andc = Ns. We note that for the case of no selection (c = 0) 
the pertinent solution is the Gegenbauer polynomial. So we try to expand the 
solution into a series of Gegenbauer polynomials, which are known to form a 
complete orthogonal system in the interval [—1, 1]. Let 


~*~ 


w= > dy T(z), 


n=0 


in which the d,’s are constants. If we substitute this into (4.6) and use e repeatedly 
the recurrence relation, 
n+2, n+ 1 


eT al) = Fg Trale) + 5g Tonle) (set T(z) = 0) 
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we obtain a 9-term recursion formula for the d,’s. Now we expand A and d,’s 
into power series of c: 


A=Kkthet ke’ +ke'+---, 
dy = (aie + axe’ + axe + ---) do, 
dz = (aie + axe + asc’ + ---) do, 
d; = (axe’ + ase’ + asc’ + ---) do, 
dy = (axe + ajc’ + ---) do, 

ds; = (asc’ + ---) do, ete., 


and substitute these into the recursion formula. By picking out coefficients of 
equal powers of c, we can determine the k’s and a’s, by means of which the eigen- 
value \ (or A) and the eigenfunction w are expressed. The most important in- 
formation is the smallest eigenvalue (Ao) which gives the “rate of decay,” and 
the corresponding eigenfunction. To get A», we set ko = 2, since for c = 0, 
A(= 4N)X) should be 2, as shown in the previous treatment of pure random 
drift in which the final rate of decay is 1/2N. 

Though the calculation involved is quite tedious, we can obtain the desired 
coefficients step by step. For the smallest eigenvalue A» we have: 
199 «2 17 3 23-41-29599 , 

a6 ie pe mh 


a ene © ana nn ae an 
2-5-7 2-55-7 23 - 38 - 56-73-11 


(4.7) 2NrA, = 1—4ce4+ 


The coefficients of the eigenfunction are: 


1 I 


a; = ’ 


2-3-7 


Ai 


"9 
or 


3.32-5-7’ 
43 = 249 X 10%, ete. 


The same method may be applied to get similar expansions for other eigenvalues 
and eigenfunctions. 
The shape of the distribution curve at the state of steady decay is given by 


(4.8) G(2) = Maye | 


It will be convenient to adjust do so that fo¢(z) dx = 1 (fixed classes excluded). 
The rate of fixation and loss of the gene A per generation at this state is given 
by $(0)/4N and ¢(1)/4N and therefore 

(4.9) ANXo = O(1) + (0). 


This can be derived from (4.5) noting that ¢(z) is finite at the boundaries (x = 0 
and x = 1), as shown for the case of no dominance in [7]. 
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TABLE 2 


2Ns = 1 


.688 


.142 
-056 
.990 
.940 
903 
.879 
865 
. 860 
0.866 


SOO whe 


ee 


A numerical example will be given here. For weak selection favoring domi- 
nants; 2Ns = 2c = 1, we get 


2Nr0 = 0.928 
and 
wy = dof To(z) — 0.005873(z) — 0.00287 2(z) 
+ 0.000473(z) + 0.000067 }(z) + ---| 


in which 7'o(z) 1, Ti(z) = 32, T(z) = $(5z — 1), T(z) = §(72° ' 
Values of ¢(z) at 0, 0.1, 0.2, --- and 1 are listed in Table 2. They are adjusted 
by Simpson’s rule so that the area under the curve is unity. ¢(1) + ¢(0) comes 
out 1.855, while 4NX» is 1.856. The agreement is satisfactory for this level of 
approximation. As a second example, we assume weak selection against the 
dominants: 2Vs = 2c = —1.2N Xo is 1.128 and values of (x) are given in Table 
2. In this case ¢(1) + ¢(0) comes out 2.254 while 4NXpo is 2.256. Again the 
agreement is satisfactory. 

The above treatment leading to the power series expansion of eigenvalues and 
of coefficients of eigenfunctions is rather heuristic. For the more rigorous treat- 
ment of the problem, further investigation of these series will be required. 

As to uniqueness of the solutions of the type of singular partial differential 
equations considered in this section, an investigation could presumably be based 
on Section 23 of Feller’s paper [22]. 

The most remarkable fact suggested by the above analysis seems to be that 
as compared with the case of pure random drift, selection toward dominants 
(s > 0) decreases the final rate of decay, while selection against dominants 
(s < 0) increases it. At least for weak selection the above results follow from 
(4.7), since the most influential term — }c is negative if c(= Ns) is positive and 
positive if ¢ is negative 

For this continuous treatment to be applicable, the population number N 


5 
02), 
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TABLE 3 


Classes Frequencies % 
7A 
6A 
5A 
4A 
3A 
2A 


1A 


Total 





should be fairly large so that 1/N is negligible as compared with 1. If the popula- 
tion is extremely small, we must treat the problem by the methods of finite 
Markov chains. The transition probability that the number of A genes im the 
population becomes j in the next generation, given that it is 7 in the present 
generation will be given by 


pae= (7) 2a — 2", = 0,1, +++, 2N), 


where z’ = x + 6z, in which z = 7/(2N), and éz is the change of gene frequency 
by selection per generation and is sr(1 — z)’ if s is small. The rate of decay of 
the unfixed classes and their limiting distribution may be obtained by iteration. 
For example, if N = 4, 2Ns = 1, the limiting form of the distribution (fixed 
classes excluded) becomes as follows (Table 3), with rate of decay (Xo) 11.875 %, 
giving 2NX» = 0.9500. If there is no selection (s = 0), it turns out that the rate 
of decay becomes 1/2N = 0.125 or 12.5%. Note that with selection for domi- 
nants, the rate of decay is smaller. 


5. Chance of fixation of mutant genes. In any large natural population, gene 
mutations may be occurring in each generation. Most of the mutant genes are 
likely to be deleterious but a few of them may turn out to be advantageous. 
Such advantageous mutant genes have a tendency to increase their frequencies 
in later generations thus having a positive chance of establishing themselves 
even in a very large population. Because of its importance in evolutionary 
genetics, the probability of fixation of mutant genes has been studied by Fisher 
[23], Haldane [24] and Wright [25] [26]. However, due to mathematical difficulties 
involved, so far only a few cases have been successfully worked out. 

In this section I will try to present the solution under quite general conditions 
and will show that the previous results are obtained as special cases. 

We will designate the selective advantage of the mutant homozygote (A A) 
by s and that of the heterozygote (A A’) by sh. Let u(p, t) be the conditional 
probability that the mutant gene reaches fixation by the tth generation, given 
that its initial frequency is p. Under the assumption of a continuous model and 
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random mating it is possible to show that u(p, t) satisfies the following partial 
differential equation: 
” du p(l — p) du ' , du 
(5.1) — = Ss sp(\l — p)fh (1 — 2h)p} —. 
at iN ap” p)ih + "P} 5p 
Here we have inevitably the following boundary conditions: 
(5.2) u(O, t) = 0, u(1,¢#) = 1. 


For the special case of neutral genes (s = 0), the pertinent solution is 


(5.2) ulp,t) =pt+ co (2i + 1)pq(—1)‘*FUL — i, 7 + 2, 2, pe 
i=] 
which agrees exactly with the results obtained by the study of moments [16]. 
Usually the process of evolution extends over an enormous period of time and 
hence the probability of ultimate fixation will be of special importance. We will 
designate such probability by u(p) which is defined by 
u(p) = lim u(p, ¢). 
t+2 

For the neutral mutant gene, u(p) = p. If v is the initial number of mutant 
genes, u(p) = v/2N for this case and hence the probability of fixation per mu- 
tant gene is 1/2N. 

For the general case the probability may be obtained by setting du/at = 0 
in (5.1). This leads to 


~p 1 
(5.4) u(p) = ep 2eDa(l—z —2er ax / | ¢ 


~( “0 


where c = Ns and D = 2h — 1. 

The rate of approach to the ultimate state of complete fixation or loss may be 
given by the smallest eigenvalue A» of equation (5.1). For a small value of c, 
we can expand Xo into a power series in ¢ as follows: 


(5.5) 2Nrx = 1+ Kic + Ke’ + Ky’ + Kec’ + 


| 


where 
Ky 


It should be noted that for the case of no dominance D = 0 and the above series 
(5.5) agrees with (4.4) provided that 2s is used instead of s to express the selective 
advantage of the homozygous mutants. For the case of complete dominance, 
D = 1 or —1 according as the mutant gene is either dominant or recessive. 
In the former case of D = 1, (5.5) agrees with (4.7). Returning to formula (5.4) 
we will consider a few cases of special importance in evolution. To obtain the 
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chance of fixation of individual mutant gene denoted by u we may put p = 1/2N. 
For the case of no dominance (D = 0), we have 


2¢ 


u=(l—e)/(l1-e~), 
or denoting the selective advantage of the homozygote by 2s, 
u=(l1—e”)/(1l — e™). 
Thus for a slightly advantageous mutant gene we may write 
(5.6) u = 28/(1 — & **) 


with good approximation. The result agrees with Fisher [23] who used the method 
of branching processes and also with Wright [25] who used the method of in- 
tegral equations. For a large N this chance is very close to 2s as given by Haldane 
[24]. For a slightly disadvantageous mutant gene (s < 0), we have 

(5.7) u = 2s8'/(e** — 1), 

where s’ = —s. The chance is not negligible if Ns’ is small. The result agrees 
with that obtained by Wright [25]. 

For the completely dominant gene (D = 1) with small selective advantage 
s(s > 0) we may use the formula u = 2s unless Ns is small. 

The case of a completely recessive mutant gene (D = —1) with small selective 
advantages s (s > 0) in the homozygous state is of special interest. Haldane 
[24] estimated the chance of fixation as of the order of +~/s/N using the method 
of branching processes and Wright [26] estimated it as of the order of +/s/2N 
by his method of integral equations. Our formula (5.4) gives 
(5.8) “v= 2s/(rN) 
as the best simple approximation for a large N. Since 

V28/aN = V/2/avV/8/N = V4/4v/s/2N, 
it may readily be seen that our result lies between those of Haldane and Wright. 
Furthermore it is interesting to note that Wright [26] obtained numerically 
the formula 1.1(s/2N)’ as the average chance of fixation for values of s ranging 
from 4/2N to 64/2N. The factor 1.1 is indeed very close to +/4/x which is 
1.128 ---. 
Finally our general formula (5.4) allows us to calculate the chance of fixation 


of a nearly recessive mutant gene with selective advantage s (s > 0) in the homo- 
zygous state. Namely for 0 < h <1, we may have 


sie) Sai iarat ni cst 
(5.9) u = e2Nentia—as yf 2 / 11 — 26(4/4Nsh?/(1 — 2h) } 


as a good approximation, unless 2Ns is small. Here (x) stands for the error 
function 


(x) = (1/1/27) [ eo? az. 
“0 
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As an example consider a case with N = 10° and s = 10°’. If the mutant gene 
is completely recessive (hk = 0), u 0.8 X 10°*. With slight phenotypic effect 
of h = 0.01 in the heterozygote u ~ 0.9 X 10°, while with 


h = 0.1, uw 2.3 xX 10°. 


6. Population structure and evolution. So far we have considered the process 
of change in gene frequency in an isolated population in which mating is ran- 
dom and the number of individuals remains constant through generations. 
This may be an over-simplification for the study of evolutionary processes in 
general, since most species in nature may have a much more complicated breed- 
ing structure. Unfortunately this immediately brings us baffling problems, for 
the solution of which new techniques will be required. 

First let us suppose that a species is subdivided into numerous isolated colo- 
nies, each of which may receive, from time to time, migrants taken as random 
samples from the whole population. Mating is assumed to be random within 
each colony. Following Wright [27] we will call this the “island model.’”’ The 
model may be realistic to describe a species inhabiting an archipelago such as 
the Galapagos Islands studied by Darwin. The number of individuals may 
fluctuate from generation to generation not only due to fluctuation in environ- 
mental conditions but also due to change in the genetic make up of each colony 
which in turn is influenced by the population number. If the number of repro- 
ducing individuals per colony is small, say less than 100, and if isolation is so 
severe that less than one migrant is expected per thousand generations, the 
chance of disadvantageous mutant genes reaching fixation may be considerable, 
as suggested by (5.7), and accumulation of such genes will lead to extinction of 
colonies. We would like to know then what is the chance that an isolated colony 
becomes extinct before a migrant comes in to start a new colony. What is the 
joint distribution of the population number and the gene frequency among 
colonies at the steady state? These questions may have to be answered before 
we reach conclusions on the optimum structure of populations for the evolution 
of a species. 

Next we will consider the continuum model of a population. The model is 
realistic for representing a species inhabiting a wide range with more or less 
uniform density. Here the whole population can not be a random mating unit 
since a tendency toward “isolation by distance” may arise due to limitation in 
the locomotive ability of the organism [27]. In the course of time advantageous 
mutant genes may arise with exceedingly Jow rate in various spots in the con- 
tinuum and these will spread into the population. If the local fluctuation of 
gene frequencies is negligible, the process of spread will be very similar to dif- 
fusion of physical particles in a medium, except here that differential rate of 
multiplication is involved among particles. 

Let x(u, v) be the relative frequency of a mutant gene denoted by A at a 
point (uz, v) in the continuum with rectangular coordinate system. The process 
of spread of the advantageous dominant gene may be described by the equation 
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(6.1) eo = mV'x + sx(1 — 2)’, 


where m represents locomotive ability of an individual and corresponds to a 
diffusion constant in physical systems, V° denotes the two-dimensional Laplace 
operator (d°/du° + 8°/dv’), and s is the selective advantage of the dominant gene 
A to its allele A’. The simplest situation is that s is constant throughout the 
continuum. The mutant gene will spread in the form of concentric circles from 
the point of origin which we may take as (0, 0). Introducing the polar coordinates 
(r, 6) and assuming that d°x/d°@ = 0, (6.1) becomes 


Ox 


Ox ax 
or 


ie idle 
r 


(6.2) 5 58 
T or 


+ az(l — z)’, 
where r = mit and a = s/m. At an early stage when the frequency of A is still 
low, the distribution may be approximated by 


r—r2/4r 


z(r, r) = xe" r/2r, 


where 2 is the initial frequency of A at the origin. Beyond this stage, however, 
we face a difficult problem of solving a non-linear diffusion equation. 

The problem of steady state distribution is worthwhile to investigate if the 
mutant gene is advantageous within a closed region but disadvantageous out- 
side, as in the case of melanic genes in many lepidopteran species which in recent 
years have increased their relative frequencies in a spectacular fashion in many 
industrial areas but remain in low frequencies in rural districts—a phenomenon 
known as “industrial melanism”’ [28]. 

Real mathematical difficulties arise, however, when we take random fluctua- 
tion of local gene frequencies into consideration. The fluctuation may be due to 
random sampling of gametes in reproduction or due to random fluctuation of 
selection intensities brought about by chance fluctuation of local environments. 
Notable contributions have been made by Wright [27] [29] [30] and Malécot 
[4] [18] for the case of no selection, but more important cases involving selection 
are yet to be worked out. 

Such studies should be indispensable for our understanding of the process of 
speciation and also of the mechanism of evolution in general. 

In his theories of evolution Wright [31] put forward an important concept of 
“balance,” especially of balance between directional factors such as selection, 
mutation, and migration and undirectional or stochastic factors such as random 
sampling of gametes and random fluctuation of environmental conditions. It 
appears that new methods of stochastic processes will be needed for a satisfac- 
tory treatment of Wright’s theory of evolution. 
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THE COMPARISON OF THE SENSITIVITIES OF SIMILAR 
EXPERIMENTS: THEORY’ 


By D. E. W. ScouMANN?® anp R. A. BRADLEY 
Virginia Polytechnic Institute 


0. Summary. The comparison of the sensitivities of experiments using different 
scales of measurement or different experimental techniques can be effected 
through a comparison of noncentral variance ratios. The distribution of the ratio 
of two noncentral variance ratios is obtained and its properties are discussed. 
Based on this distribution, tests of hypotheses on the parameters on noncentrality 
of two noncentral variance-ratio distributions are developed. 

It is shown that the distribution of the ratio of two noncentral variance ratios 
may be approximated adequately by the distribution of the ratio of two central 
variance ratios with appropriately adjusted degrees of freedom. A table for use 
in applications of the latter distribution is given for one-sided tests at the 5% 
level of significance. 

Through the association of the distribution of the multiple correlation co- 
efficient in regression models with that of the noncentral variance ratio, it was 
also possible to develop test procedures on multiple correlation coefficients. 

Much of the discussion in this paper is on comparisons of similar experi- 
ments in the sense that variance ratios with the same degrees of freedom are 
compared. However, it is shown how these results may be generalized for com- 
parisons of dissimilar experiments. 


1. Introduction. The problem of comparing different scales of measurement 
for experimental results was discussed by Cochran [1] in considerable detail in 
1943. He assumed that analysis of variance techniques were applicable and con- 
fined his attention to the case in which all scales measure the same experiment. 
It was noted that a comparison of the sensitivities of two scales should depend 
both on the experimental errors associated with them and on the magnitudes of 
the treatment effects in the scales. In the concluding section of his paper Cochran 
indicated how a result of Pitman [7] may be used to compare the sensitivities of 
two scales in two-treatment experiments and went on to state that in general 
the comparison should depend on a test of significance of a hypothesis on the 
parameters of two noncentral variance-ratio distributions. It is this problem 
that is considered in this paper. The results will be useful not only in comparing 
scales of measurement per se but in comparing different experimental techniques 
in a broader sense in similar experiments and, under certain conditions, in com- 
paring two population multiple correlation coefficients. 
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Let F be a noncentral variance ratio with 2a and 2b degrees of freedom (it 
is convenient to use the degrees of freedom in this form; a and b may be integers 
or half-integers and no generality is lost). Then we define 


(1) i = 2aF’/2b 
and w has the density function, 
(2) f(a; a, b, ») = f(a; a, ble Fila + b, a, AW/(1 + w)). 
Here \ is the parameter of noncentrality associated with F and ua. 
(3) f(u; a, b) = f(u; a, b, 0) = (Bla, b)J*u* "(1 + uy, 
where B denotes the beta function, and 
ala ” 1) s 
B(8 + 1) 2! 
is the confluent hypergeometric function.’ If u is related to a central variance- 
ratio F through (1), f(u; a, b) is the density function of u. References [3], [13], 
and [11] are noted for discussions bearing on derivations of (2). 

In Model I of the analysis of variance with so-called fixed parameters in the 


additive model, if 7; is the effect of the ith of ¢ treatments, }>{-1 7; = 0, and, 
if o is the population experimental error, 


F(a, 8,2) = 1 +52 + de oes 


t 
(4) A=k > 7; /20°, 
i=] 


where & is the number of observations in each treatment mean. We see at once 
that \ is a parameter incorporating both the experimental error associated with 
the scale and the magnitudes of treatment effects in the scale. We take a and tw, 
as defined in (1) to be the appropriate statistics for two similar experiments on 
which to base comparisons of the sensitivities of the two scales or experimental 
techniques. We shall consider the distribution of 


5) wd = / the 
under the assumption that the two similar experiments are independent. 

The term “similar experiments” has been used in the sense that F-ratios for 
treatment comparisons resulting from them have the same degrees of freedom. 
When, in addition, k; = ke, k; , the number of observations in each treatment 
mean of experiment 7, we shall call the two experiments identical. Major emphasis 
will be placed on the distribution of # when a and wz arise from identical ex- 
periments. Then the null hypothesis of equal sensitivities is equivalent to the 
hypothesis that the two parameters of noncentrality, 4; and Az , associated with 
i, and tig are equal. When the experiments are similar but not identical, the 
hypothesis should be on the equality of \,/k; and Ao/ke . 


3 We shall use the ‘‘dot’’ notation to indicate variates and distributions associated with 
noncentral valiance ratios. Functional forms will be abbreviated, e.g., f(a; a, b, \) to f(a), 
whenever it may be done without confusion 
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While values of the distribution function of w will be obtained for small a and 
b, adequate tabulation of the distribution would be difficult with ordinary com- 
puting facilities and would result in a three-parameter classification even for a 
specified level of significance. Similar difficulties entered in the tabulation of the 
distribution function of a although tables and charts are available ({11], [4], 
[6], and [12]) and Patnaik [5] proposed an approximation in that case. Patnaik 
essentially approximated to the density f(%; a, b, ) using the density 


on u ——— ; . . 
(1 Ky( 2; a’, b); the two densities for u were given equal first and second mo- 


ments through choice of K and a’. In our notation,* it is required that 

(6) a’ = (a+ )’/(a+ 2d) and K = (a+ )/a. 

In this paper we consider approximating to both the distributions of wi; and 1, 
following Patnaik’s method and then obtain the distribution of w on the assump- 
tion that w is the ratio of two independent variates with distributions of the 
form (3). The distribution of the ratio of central variates, 


(7) w= U/m, 


is then of interest. It will be shown that the distribution function of w with 2a’ 
and 2b degrees of freedom is a good approximation to the distribution function of 
w with 2a and 2b degrees of freedom given \; = Ax = A on the basis of a com- 
parison of available percentage points of the two distributions. 

Values of wo such that P(w = wo) = .05 have been tabulated for ranges of 
values of a’ and b. The computation of such tables will be discussed. 


2. The distribution of the ratio of similar noncentral variance ratios. The 
marginal distribution of w in (5) is obtainable from the joint distribution of 
i, and ti written as the product of two expressions like (2) on the assumption of 
independence of w% and tw, and given similar experiments in that both variates 
depend on 2a and 2b degrees of freedom. With the specification of f and ,/; in 
(2), this joint density function is’ 


ris! 


eos sk as AiA2 [Bla ‘ r, b) Bla +s, brat a 


; (1 + een 4 —S 
When we define new variates w# and z through the relations 
ty = wir — 1)/(w — x) and ww = (x — 1)/(w — 2), 


w has the definition (5) and we may write down the joint distribution of w and x. 
From that joint distribution it is at once evident that the marginal distribution 
of w is 


4 Patnaik’s notation differs from that used here. His degrees of freedom, v; , rz , and », 
correspond to 2a, 2b, and 2a’, respectively ; his \ is twice our \. K in our notation corresponds 
to Patnaik’s k. 

Sr and s take values 0, ... , » unless otherwise specified. 
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sf oz —Ai—Ae AAS i —1 
(2) g(w; a,b, Ay, Ax) = € 2, 2. — [Bia + r, b) Bla + s, b)) 
(8) , “es Tiel 


e —] e . 
- a" H(w; a, b, r, 8), 0 < w 


where 
w 
\ a \ —(2a-+2b - 2a+rt+s—l; . 2—1_ —(a+b 
(9) H(w) = (w — 1) terre / (¢ — 1)°"""(w — 2)” zt " dz. 
1 


The form (8) is adequate for all values of w including w = 1 in view of the follow- 
ing form (10) for H(w). 

The integral (9) appearing in (8) may be written in terms of hypergeometric 
functions following Erdélyi ({2], p. 115). Then with the transformations, y = 
(x — 1)/(w — x) when0 S w S Landy = w(z — 1)/(wW — z) whenl Sus ~, 


a + wy) ory + a dy 


= Ba +r+ 8, 2b):Fila+b+r,2a+r+s, 


2a+2b+r+sl-—wu), OsSW 


* . —(a+b+s 
—(2a+r+ 2a+r+e— & —(a+b+r) 
oT | y ‘(1 +4) (1 + y) dy 
0 w 


—(2a+r+s 


Li Ba + r+ s, 2b).Fila +b + 8, 
2a+r+s8,2a+ 2+7r+s, (w — 1)/w)) l w oO, 


H(w) is expressed in the two forms for convenience; H(1) may be obtained 
from either form and there is no discontinuity for H(w) at w = 1. The integrals 
in (10) and (11) hold for 0 S w S «; the division is made only to obtain ex- 
pressions in terms of convergent hypergeometric series. Final forms for g(w) 
could now be obtained by substitution for H(w) in (8). 

From the specification of H(w) in (9) it is clear that we could obtain a finite 
series expansion using binomial expansions in the integrand for 2a, 26, r, and s 
are integers. These finite sums can of course be related to the functions »F; in 
(10) and (11). 


3. Properties of the distribution, g(ww). (i) Bounds and values of g(w). When 
w = 1, with the replacement of (1 + wy) by (1 + y) in the denominator of the 
integrand of (10), it is apparent that H(w) S B(2a + r+ s, 2b) and then, 
from (8) 


HO SW EE CI Bea +r + 8,28) 


-(Ba+r7,)Bat+s,d\", Lewd om. 
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Similarly, when 0 S w S 1, with the replacement of [1 + (y/w)] by (1 + y) 
in the denominator of the integrand of (11), we have 

H(w) s w™*"*? B(2a + r + 8, 2b) 


and again from (8) 


ag HO SUT TD ENOL peaa + + 92 
- (Bla + 1, |) Ba + s, b)I", Os wsdl. 


But B(2a + r + s, 2b)[B(a + r, b)B(a + 8, b)J = Blat+b+rat+bt+s 
-(B(b, b)B(a + r,a + 8)}* < 1/B(b, b) and 


(14) g(w) < et w*"/B(b, b), w < 
and 
(15) g(w) < "ie "Bb, b), «= OS HW 1. 


Now convergence of the series for g(w) is established for 0 < w < @ for all 
terms of that series are positive. 


Limits for g(w) as w — 0 and as w — & may be obtained. Returning to (8) 
and (9), we note that 


pene 
we A )) = = . 1)" if (x ae err" sacs ww) a+b+r) dx 


(w — ce 


w* 


< a-ser sing (1 ci g)etrte ty 1 b—2—¢ io 


w<clée>Oar>1l+é 


w* 
(1 = peter 1 


Ba +r+s,b —1-— 8), 


w<1lé>Oab>1+é. 


Hence limi.o w**” "H(w) = 0 if a, b > 1, for é is at our disposal. This implies 
that lims.o g(w) = 0 if a, b > 1, which will be the case in all practical situations. 
A similar argument shows that lim... g(w) = Oif a,b > 1. 

In summary, 


gO) = 0, a,b> 1, 
an g1) =e" > > ms B(2a + r + 8, 2b)[B(a + r, b)Bla + s, db) 
< 1/B(b, b), 
g(o) = 0, a,b>1, 


and the series for g(w) converges for 0 S w S ~. When A; = Az, as will later 
be required, P(t S tie) = P(te S %) = 4 dand we see that g(v) then has a median 
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at w = 1. In general g(w) is a unimodal distribution with mode between 0 and 1 
and approaches 0 as w — 0 and as w > ~. 


(ii) Moments of g(w). The kth moment of w, E(w*) = ja can be obtained most 
easily from the joint distribution of w% and tw, and the definition (5) for w. Then 
it is a simple problem in integration to show that 


1 aay (CD + KTH — k) Ai Ba +k +7,a —k + 8) 
oe (r(b)}? Da  Bla+r,a +8) 
-r,—». (6b + KT — k) l(a + k)T(a — k) 
=< ceeeteet totem Wilt te De 
c TO}! iT@}? eri 
- F\(a — k, a, ds) 


for a,b > k. When k = 0, it is clear that ji = 1. 
When k = 1, we sum (18) first with respect to r and obtain 


oe d2 ri 
Pe Ens btm) ( +é a eens, nt’ |. 


This may be rewritten as the integral (obtained by multiplying each term in 
the series by A? and noting that it is an integral of \y*e? followed by some 
reduction), 


; 


(19) Ae de cues ; a + ds) [u o-2,dAs(o—1) dy. 


Expansion of the exponential in the integrand of (19) and subsequent integration 
yields 


(200) 4 = et | -*+ aoe - |, a,b > 1. 
In the special case where a = 2, it is apparent from (19) that 
iy = (2 + u)(1 — &™*)/(b — Ire. 
When k = 2, reduction similar to that when k = 1 yields 
dat jis = a(a + 1)b(b + 1)[(a — 1)(a — 2)(b — 1) — 2)J" 
-{1 + 2d;/a + dj/a(a + 1))[1 — 2dr2/a + 3A3/a(a + 1) — --- |, 
a,b > 2. 


When a and/or b is S 1, x, k = 1, does not exist; when a and/or b < 2, s&, 
k = 2, does not exist. 


4. The distribution of the ratio of noncentral variance ratios (general). While 
the distribution of w obtained in Sec. 2 will usually be the one required in practi- 
cal work, it is not much more difficult to obtain more general results. 

Consider independent variates F,; and F,, w% = a: Fi/bi , te = a2 F2/be, 
and tw = cti/t where F; and F, are noncentral variance ratios with degrees of 





908 D. E. W. SCHUMANN AND R. A. BRADLEY 


freedom, 2a; and 2b, , and 2a, and 2b. with ¢c = azb;/a,b2 and parameters of non- 
centrality, \; and d» , respectively. 1 and tw. will then have distributions of the 
form (2). An argument similar to that of Sec. 2 yields for the density function of w 


gw; a;, bi, a2, be, r,s) = e™™ OY Bea, + 3, by) 


ris! 


- Baz + 8, be) ew (wr; a1, by, a2, be, 7, 8) 


2? 


where 
(23) H(w) - [ of ttertens is he wy ie) rte” (4 + 7a dy 
0 


similar to (10). The integral (23) can be used to express H (vi) in terms of a hyper- 
geometric series when 0 S w S c and it can be transformed to a form similar to 
(11) and thence in terms of a hypergeometric series when c S w S ~. 

This general case is not of interest in comparing the sensitivities of experi- 
ments except as it reduces to the case of similar experiments and possibly when 
q = a2, by * be. 


5. The distribution of the ratio of central variance ratios and its properties. 
We have already indicated in Sec. 1 that we shall approximate to the distribution 
of the ratio of two independent noncentral variance ratios using the distribution 
of the ratio of two independent central variance ratios. The necessary distribu- 
tion and its properties may be considered by the specialization A, = A, = 0 in 
results given above. We now summarize those results for this special case for 
they will be required in following sections.* 

General results (22) and (23) now become 
(24) g(w;a,,b:, a2, bs) = ¢ “{B(a, , b:) Blas , be)|-'w" "Hw; ay , by , ae , be), 

0OsSws a, 
where 
(25) H(w) va | yr A wy/e) +. y) oa dy. 
0 
We consider in more detail the case for which F; and F; arise from similar ex- 
periments. Then, with a; = a, = a, b; = b, = b, we have 
(26) g(w; a, b) = w* *H(w; a, b)/[B(a, b)}’, 0Osws~, 


with 


(27) H(w) = [ y [Ll + wy)(l + y)|-°™ dy. 


6 We retain the notation already adopted but drop the ‘“‘dot’’ when discussing central 
variates and their distributions. 
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H(w) in (27) may be rewritten from (10), (11), and (9) respectively as 
H(w) = B(2a, 2b).F;(a + b, 2a, 2a + 2b,1 — w), 0swtl 


(92) = w  B(2a, 2b)2F:[a + b, 2a, 2a + 2b, (w — 1)/w), lew 


i 
—(2a+2b—1) 2a—1 2-1 —(a+b) 
= (w — 1)" / (x — 1)*"(w — xz)” 24 *™’ dz, 0 
1 


We now list certain special cases of g(w) that were used to some extent in 
checking tables prepared by more general methods in Sec. 8. These results are 
most readily obtained from the third form of (28). 


Parameters g(w) 
(i)a = 4,5 = 4. In w/2'+/w (w — 1) 
(ii) a = 3,b = 1. 1/2V/w(1 + Vw)? 
(iii) a = 3, 3. 4(w’ — 1 — 2w In w)/2 VV w(w — 1)° 
(iv)b = a+ }. [2B(2a, 2a)\w* "(11 + Vw) 


7 , : a (—1)" 422 cosh x < i+1 
v) a = 3, (a +5) an integer. }e | <BGb) anh® 2 +— x (—1) 


i=1 


1 a 77 2 
BOtRDTI- Dein s], 2 =4inw. 


B(b,b 4+ 1 t) 


Results similar to (iv) may be given for b = a — 3,6 = a+ 3/2,b =a + 5/2 
and b = a + 7/2 without much trouble. 


Results on the form of g(w) carry over to g(w). Thus the limit of g(w) is zero 
as w — 0 and as w — ~. In addition the median value of w is unity as it was 


for w with A: = A, . Moments of w about zero follow from the more general case 
also. Now 


(29) uw. = Bla+ k,b — k)B(a — k,b + k)/{[B(a, b)f. 

In particular, 

(30) uw, = ab/(a—1)(b—1), a,b>1, 

and 

(31) ws = ab(a + 1)(b+ 1)/(a — 1)(b — 1)(a— 2)(b-—2), a,b>2. 

The variance of w is 

(32) o° = ab(2a°b + 2ab’ — a® — b’ — 4ab + 1)/(a — 1)*°(b — 1)*(a — 2)(b — 2). 


Consider w for similar experiments with 2a and 2b degrees of freedom and 
assume that A; = Ax = A. We then compare the moments of w with the moments 
of w for similar experiments with 2a’ and 2b degrees of freedom with a’ = 
(a + d)*/(a + 2X) as given in (6). From (30) it now follows that 
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ee gin ue Ma +r — 1) 
wi = (a + A)b/(a — I)(L Df + Met y=] 


= (a + A)b/(a — 1)(6 — 1)(1 + A/a) for large a. 
Also, from (20), 


2. yh | 2 
ii = (a + nb 1 S40 | / 1)(b — 1) 
= (a + \)b/(a — 1)(6 — 1I)(1 +- A/a) for large a. 


Hence yu; and 4; are approximately equal under the stated conditions for large a. 
Similarly it may be shown that, for large a, 


, 


us = fo = b(b + 1)(a + 2ad + X° + a + 2A)/(a — 1)(6 — 1) 
-(a — 2)(b — 2)(1 + X/a)’. 
6. The probability integral of g(w). We first consider w based on similar ex- 
periments, g(w) in (8), and turn our attention to means of evaluating G(ti) = 
P(w = ti) = r g(w) dw. With interchange of order of integration and the 
definition of H(w) in (10), it follows that 


Ae aid Aj AS i 
G(tie ; a, b, Ay, 2) = EO Xu x —F [B(a + r, b)B(a + s, b)y" 

re : (a+b-+e wo e+r— —(at+btr 

“a f+" | (yw)? + yw) Oy deb dy. 

“0 “0 
‘Transformation from w through setting x = wy/(1 + wy) and integration with 
respect to x of the resulting integrand, x2°*”"(1 — x)’, following expansion in 
binomial series allows us to write 


G (tio) “2 } i wae [B(a + r, b) Bla + s, by" 


(34) . yr + y)“ H+) Cin y)? (1 + oy) °*" 


“0 


| 1 a eed (b — 1)toy ; oe - l(b — 2)(tio y)” a |d 
atr (@+r+iittyy) | 2Watr+ 2 + toy) v- 


When b 2 1 is an integer, the series in square brackets will be finite. Furthermore, 
it has been shown [8] that 


G(tin ; a, Na ,A2) S G (io ;a,n+ 3,r¥1,A2) S G(tde ;a,n+ 1,1, As) 


for n an integer. We shall then only consider the evaluation of (33) or (84) when 
b is an integer and obtain values of G(tip) by interpolation when 6 is not an 
integer. This can be attempted in several ways. 
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(i) Direct evaluation of G(ti0) from (34). Evaluation of G(tis) from (34) will 
depend on the evaluation of integrals of the form 


(35) h(tiyg ; m,n, p) = [ y" "(1 + y) "(1 + tiny)” dy 
“0 


like H(w) formerly defined and where for fixed r and s in (34), m 
Za+t+r+st+yn=-at+b+8,p=atrt+jj =0,---, (6-1), ban 
integer. As for H(w), h(i) may be written in terms of hypergeometric series, 


h(t) = B(m,n + p — m)oFi(p, m,n + p,1 — vw), OS te SF 1, 


and 
h(rio) = wo "B(m, n + p — m)Fi[n, m,n + p, (tio — 1)/rd0], 1 


when n + p — m > O as will be the case here. If tables of the hypergeometric 
function were available, we could evaluate G(vz») through evaluations of h(t») 
and using a finite number of terms of (34) which may be shown to converge 
through a method similar to that used in Sec. 3(i). 

As an alternative to the use of hypergeometric series, h(i) may be trans- 
formed to a form like (9) and evaluated through the use of binomial expansions 
in the integrand. 

Use of (34) to evaluate G(t») will in most instances require evaluation of a 
large number of terms to attain even 2- or 3-decimal accuracy. 

(ii) Integration for G(vi») after summation in (34). When b is small (say 6 < 6), 
the following method based on interchange of the order of summation and inte- 
gration in (34) gives good accuracy in evaluating G(ti). Observe that 


ALY y Ta+b+s)_ a+b) hog 22y ) 
o (4 s!T(a + s) I'(a) Mf (a 7 l+y 


_Ta@+).. Mw) 2 ( _ _»y )\ 
= = (a) exp (; 4 y iF, b, a, er y) 


the latter form based on the use of Kummer’s relation ((2], p. 253). This second 
confluent hypergeometric series is finite. We can also write 


> (ee) Mo +b +9) 
1 + woy ril(a + r) 


; eS ee 4. ee ee . r ey. 
a+r a+r+1 )\(1 + Woy) ana a ) 


Ay Wot Fy 


1+ wy/ r! 


‘(Pot rPy +r — IP2 + +--+ rr — 1)---¢ — 6 + 2)P) 


0 Ar Woy )j P Aitioy ) pei >. Ar Woy 2 . 
exp ( aNey Hat HA) Pl, 
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where P; ,i = 0, --- , (b — 1) are polynomials in woy/(1 + toy) independent of 
r and determined by appropriate grouping of terms in the first form above. With 
the substitution of the results of this paragraph in (34), 


a T'(a + b) .a [ 2a—1/ \—(a+b) Ne 
Gti) = Para, 2 (1 + y) (1 + woy) 


eae ot Ay it Ae = ” ‘casi a wy ) 
(36). exp ( <n ; 7) if ( b, a, iT he 


| Po + (; > oY) P+ + (; + thy Po | dy 


Evaluation of G(tiv) now depends on evaluating a finite number of integrals 
of the form 


- ( 
T(tin 3,7, = | ex | -a{ — + 
4 ne . Ll + woy 


(37) “0 
- (1 + y) "(1 + tiny)“ dy, 


where p and g are integers or half-integers and when \, = Az = A, the case of 
most immediate interest. Reduction of (36) to integrals like (37) follows when 
powers of y and powers of woy in the numerator of the integrand of (36) are 
written respectively as powers of |(1 + y) — 1] and [(1 + toy) — 1] and expanded 
in finite binomial expansions. Recursion formulas reduce evaluations of (37) 
to forms depending on five basic integrals, J(ti9 ; A, 0, 2), Z(we; A, 1, 1), 
I (two ; A, 3/2, 4), [(wo 3d, 4, 3/2) and I(wdo ; A, 5/2, 4). Some of the recursion for- 
mulas are 


I(tio 3, DP, g) = = (to — 1) “[rol (tdo A, p — 1, g) 
—I (ti ; A, p, g — 1)I, 

I(t ; —d, 0, g) = AI(g — 2)I (tiv 5d, 0, g — 1) — wia'e™ 
—vwo I(wo ; A, 2, q — 2), 

I(wo 5, p,0) = N'[(p — 2) (to 5d, p — 1,0) — &” 
—wol (tio 5 A, p — 2, 2), 

I (tio ;, 2,0) = Xr fL — &*] — wol (trio ; A, 0, 2). 


The basic integrals may be evaluated by expanding their integrands in Taylor 
series about tio = 1. The coefficients in the resultant expansions depend on in- 
tegrals of the form 


r 


oo 9 1 
y aie | —2hu rate 
——_-———_ ¢ itv dy = e 1 — u)’u**” du, 
I Tia" TTS ( 
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which may be evaluated directly. When ti 2 2, it is advantageous to obtain 
the series in an expansion in terms of 1/tip about 1/t%» = 1. This is not a different 
problem since it is easy to show that J(1/ti» 5d, p, g) = rho! (rd ; A, g, p). Schumann 
[8] has shown that if /;, i = 0,---, r, are terms in the Taylor expansion of 


I(1/tio ; A, p, g) and if R,4,; is the remainder after these (r + 1) terms, 
1 a woR,41 ‘ty ( Wo —_ i > t,1/[wWotr—s —_ (ri ~—_ 1)é,]. 


This relationship was used to determine the accuracy of evaluations of the basic 
integrals. For brevity we have omitted the demonstration of the stated in- 
equalities. 

This method (ii) was used to obtain the values of G(w») given in Table II 
and for the comparisons in Table IV. It was found that the Taylor series for the 
basic integrals converge slowly and as many as 18 terms were required to obtain 
6 decimal accuracy in some of the basic integrals. Since the five basic integrals 
are used in recursion formulas, errors tend to be magnified; however the com- 
bination of integrals /(ti ; A, p, q) required to evaluate G(we) enter in such a 
way that errors in the J (ti ; A, p, g) are to a large extent compensating for each 
other and reasonable accuracy can be obtained. 

The computation required to construct even a limited table of values of G(ti») 
is very extensive and an approximate method was found. 

(iii) Approximation to G (wo) using G(tio). We have indicated that we shall 
approximate to g(w) with \; = A: = \ and 2a and 2b degrees of freedom by using 
g(w) with 2a’ and 2b degrees of freedom. We shall extend this idea to yield an 
approximation to G(ti») using G(v) where 


a Wo 
(38) G(ti.) = | g(w) dw. 
Evaluations of G(w) will be considered in the next section. The results in Table 
IV indicate that the approximation will be adequate for most practical situations. 
Extension. The discussion of G(ti») has so far been limited to the case for similar 
experiments. The extension to g(ti) in (22) is straightforward. Difficulties in com- 
putation are however almost prohibitive. 
For the general case, the form of G(vi») like (34) is 


a Sah ans 
G (tip ; a1, br, 2, be, Mu, Az) = ET? | he oer 


(39)  - [Bla +r, b:)Bla: + s, bs)” I yO py) 92 (nig)? 


a+ ive | 


(a +r + 11 + yir/e) | 


This distribution function has the same form as (34) and methods of evaluation 
discussed for (34) could therefore be applied. 
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7. The probability integral of g(w). Results on the probability integral of 
g(t) with \; = A» = \ carry over to the probability integral of g(w) when \ = 0. 
For similar experiments the form comparable to (34) is 


G(w.: a,b) = [Bla, b)\* [ y (1 + y) ©) wey) (1 + wy) 


a (6 — I)woy “ oe 1)(b — 2)(woy) a i ie 
a (a+ 1)\(l+woy) 2a + 2)(1 + wy)? 


Again the series in square brackets is finite when b is an integer. 
G(wo) can be obtained in a form like (36) when b is an integer and depends on 
integrals 


(41 I(wo; 0,-p, g) = I(wosp, gd) = [ (1 + y) °C + woy) * dy. 
0 
For example, when a = 2, b an integer, 
G(wo) = 1 — b(b + 1)[(b + 1)T (wo 56 + 1, b) — (6 + 1) (wo ; 6 + 2, 5) 
— bI (wo; 6b + 1,6 + 1) + b1(wo 3b + 2,6 + 1)j. 


Similar expressions may be found for other values of a, given 6 an integer. Inter- 
polation for b not an integer is again possible or direct evaluation may be used. 


TABLE | 





Values of wy for similar experiments such that 1—G(wo) = 

1 | 2 

| af 2 ' 
66.12/32.76 |: 24.37 |23.10 77 |21.39 
40.81/18.35 4 2.! 11.97 .09 |10.85 
13.9] : 9.32 | 8.62 . 7.90 | 7.69 
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_ 
oS 
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_ 
> 


- 
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.37 
.85 
48 
19 
.98 
.82 
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.65 | 


61 


> 
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TasBie II 
Values of wo such that 1 —G(ti») = .05 


Tasue II] 
Values of wo such that 1—G(wo ;a’,b) = 05 


9.6 
8.9 


7.0 


The recursion formulas previously given may be applied when J is set equal to 
zero. The computation of G(w») is considerably easier than that for G(wo). 

In certain special cases alternative methods of evaluating G(woe) are available 
based on the special cases indicated in Sec. 5. For example, when b = a + 3, 
G(wo) = I,(2a, 2a) where J, is the incomplete beta function with x = we/(1 + wi). 
Other special cases with b = a — 3, b = a + §, etc., were used as a check on 
some of the computing. 

The general form like (40) based on (24) is 


~% 


G(wo : a; , b;, a2, be) = [Bla,, b:) Bae, be))™ | y? "(1 + y) me 


“0 


1 a 1 (b; —_ Ll )ywo/c 
- (app,/e) tO la Sie Sc: <n .<dubriemeaatiaeeceen saa ne 
YWo/Cc + YWo/ec 4 (a, + 1)1 + ynn/e) y 


8. Tables of Gi Wo) and G (wo). 


Table 1. Values of wo such that 1 — G(we) = .05 are given in Table I for similar 
experiments for ranges of values of a and of b wide enough to meet most practical 
situations. The table is essentially restricted to values of the parameters, a < b 
for g(w; a, b) = g(w; b, a) and indeed a S b again covers most situations com- 
monly met. 

The formulas given in Secs. 6 and 7 were used in constructing Table I, and 
‘trial and error’? methods were used to arrive at the appropriate values of wo . 
Tables of the incomplete beta function were used to check certain entries in view 
of the special cases mentioned in Sec. 7. 
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TABLE IV 


Bounds on 1—G(wo ; a, b, X) for certain values of wo in Table III 


a r G(wo, 4,.5,d) 


| 
| 
| 
| 


.945-.950 
.935-.940 
-940-.945 
.940-.945 
.945-.950 
.945-.950 
.945-.950 
.940-.945 
.940-.945 
.940—.945 


8 23.4 


_ 
Co 


“1 © @ c , 
| wowrowmnme 


Hee DONNER ee 
| 
WOwwNNNHNNNW | - 
| 
CO > wm WC DD hb DD & 


ao 


Table II. Some of the values of tw» such that 1 — G(v%») = .05 are given in 
Table II. For this table, 4; = \2 = A, and F, and F, defining w were both taken 
to have 2a and 2b degrees of freedom. Formula (36) was used after the basic 
integrals I (tie ; A, 0, 2) and J(ti ; A, 1, 1) were evaluated in the manner described 
in Sec. 6(ii). Values of A, a, b in Table II are too limited to make the table of real 
practical use and its main purpose is for comparison with Table ITI. 

Table I11. We approximate to G(wo ; a, b, A) using G(u ; a’, b) where a’ is 
defined in (6). This approximation has been used to obtain values of w» listed in 
Table III such that 1 — G(wo ; a’, b) = .05, the values being obtained by inter- 
polation in Table I. Table III then contains values of wo appropriate for com- 
parison with values of tip in Table II. We see immediately that values of wo and 
wo agree quite well. We can more easily assess the importance of the small dif- 
ferences observed by examining Table IV. 

Table IV. In Sec. 6(ii), bounds on the error in computing G(tio ; a, b, X) were 
stated. In order to further compare the approximation to values of wi» in Table IT 
by values of wo in Table III, we have considered the values of wp in the first three 
columns in Table III and evaluated 1 — G(w» ; a, b, X) as indicated for Table II 
and in Sec. 6(ii). Then bounds on 1 — G(wo) are given in Table IV. Each value of 
Wo is such that 1 — G(wo ; a’, b) = .05. From Table IV it is clear that the values 
wo are sufficiently close to the appropriate values of ti) to be satisfactory for 
most purposes. 

Some general comments based on Tables II, III, and IV are in order. 

(i) Values of G(vi ; a, b, 4) and G(wo ; a’, b) are fairly stable even for con- 
siderable variation in values of \. This implies that it will have little effect in 
applications if we enter tables at a value of \ somewhat different from its true 
(and usually unknown) value. 

(ii) Values given in Table III are close enough to the corresponding values in 
Table II, even for small degrees of freedom, to make their use meaningful. Since 
the construction of percentage points of g(w) is much easier than the construction 
of such values for g(w), it was decided that Table I be constructed and that its 
use will be satisfactory. 
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9. The use of Table I. 
(i) Tests of hypotheses on sensitivity. In the comparison of the sensitivities of 
identical experiments, we shall be interested in tests of the hypothesis, 


(43) Ho: A = Ae =A, 
against one-sided and two-sided alternatives, 
Ha:(1) A > As 
(2) Ar # Ae. 


(44) 


The test statistic is # and we limit consideration to the case of identical experi- 
ments, a = F,/F., Fy and F: independent noncentral variance ratios with 
parameters \; and dz, respectively, and 2a and 2b degrees of freedom. The test 
procedure will be to reject Ho with significance level a for H, : (1) when w > tio(a) 
and to reject Ho with significance level 2a for H,:(2) when w > tio(a) or 1/e% 
> wo(a). The other one-sided alternative, H,:A; < Az, is included under H,:(1) 
by interchange of definitions of F; and F,. 

It is apparent from g(w) that \ enters as a nuisance parameter in the calcula- 
tion of tip(a). Fortunately wo(a) is not greatly affected by changes in \ and con- 
sequently it should be satisfactory to estimate A by taking the average of esti- 
mates of A; and A, . Such estimates may be found through estimating \ in (4) 
through equating well known expectations of mean squares to observed mean 
squares in the analysis of variance. 

We use Table I to obtain an approximation to ti(.05). It is necessary only 
to compute a’ in (6) using a and the estimate of \ and then to interpolate in 
Table I for wo , the required approximation to tio(.05). Since we are at present 
limited to the use of Table I, we must take a = .05. 

(ii) Tests of hypotheses on multiple correlation coefficients. Consider R, the multi- 
ple correlation coefficient of the dependent variable on p independent variables, 
in usual multiple regression with assumed nonstochastic independent variables. 
Let R be based on N observation vectors. Then it is well known that R’/(1 — R*) 
has the distribution /[R’/(1 — R’); p/2, (N — p — 1)/2, A] defined in (2) and now 


(45) \= (a+ b)p /(1 — p’), 


where p is the population multiple correlation coefficient estimated by R. 
To compare two population multiple correlation coefficients in identical re- 
gression experiments, we test 


(46) 
against 
Ha:(1) pi > p2 
(2) pi ¥ p:. 


This test is identical with that of hypotheses (43) and (44) upon proper associa- 
tion of the parameters. We have redefined \ in (45) and now require 


(47) 
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a = p/2 
(48) b = (N — p—1)/2 
w = Ri — R2/R0 — R}). 


\, and hence p; and p: , must again be estimated and we suggest the method 
proposed by Snedecor ({10], p. 348). This method is given by 


(49) p (estimated) = 1 — (1 — R*)(a + b)/b 


in our notation. 

It is of interest to compare values of p in identical multiple regressions, for 
R’ is commonly used as a measure of the fraction of the variation in the de- 
pendent variable explained by regression on the independent variables. 


10. Concluding remarks. The main effort in this paper has been devoted to 
considerations on the distributions of w and w for similar experiments. We have 
indicated, however, the necessary generalizations for consideration of the dis- 
tributions of w and w based on central or noncentral variance ratios with 2a, 
and 2b; and 2a, and 2b. degrees of freedom. For example, G(w) in this general 
situation is given in (42). 

The main applications of this work are in the comparison of the sensitivities 
of identical experiments and in the comparison of the squares of multiple correla- 
tion coefficients from identical regressions [in the sense that (a + b), a and b in 
(45) and (48) are the same for both experiments]. Schumann [8] has suggested 
that, when a; and az and 6; and be differ only slightly and are moderately large, 
an adequate approximation to wo(.05) may still be obtained from Table I. 

Consider the more general situation with a; = a2, = a, b; not assumed equal to 
be , and k; not assumed equal to ke, k; , and kz being values of k in (4) for the 
two independent experiments. Then, for a test on sensitivities, we would take 
the null hypothesis to be 


(50) Ho: Ai/ky = deo/k. = A. 


Our test statistic would be w = F,/ F. where F;, i = 1, 2, is a noncentral vari- 
ance ratio with 2a and 2b; degrees of freedom and parameter of noncentrality 
\; . If F;, i = 1, 2, is distributed approximately like KF; , written F; ~ K; F;, 
where F; is taken to have the central variance-ratio distribution with 2a‘ 
2(a + d,)°/(a + 2d,) and 2b; degrees of freedom, we take 


(51) w~ KyF,/KeFs = (a + ky A)Fi/(a + ke A)F2 
= CUy/ Ue, 
where 
(52) c = bia + hkeA)(a + 2kyA)/be(a + kyA)(a + 2keA) 


in view of (50), (6), and the definition for u parallel with (1). For the test or 
sensitivities, we suggest the following procedure based on the apparent stability 
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of values of wo in Table I, but the suitability of this procedure is not now subject 
to check. Estimate c by first obtaining estimates of \, and d, from the separate 
experiments and by using the average of the estimates of \,/k; and \2/kz to esti- 
mate A. Use the distribution of w = u,/u, for similar experiments taking a* = 
(a; + a2)/2 for a and b* = (b; + b)/2 for b and read the critical value wy from 
Table I. The critical value of ti is then taken to be cwp as an approximation. A 
similar procedure may be used with the correlation coefficients. If (a, + bi) ¥ 
(a2 + be), the hypothesis that pj = p2 is not equivalent to the hypothesis that 
Ai = Az in view of the form for A in (45). The steps required for an approximation 
are parallel to those discussed in this paragraph. 

Extensions of Table I for values of a other than .05 are desired. Means of ex- 
tending Table I for values, a = .025, .01, and .005 are being investigated and it is 
hoped that these extensions can eventually be obtained. Extensions of Table I 
to the general case where a, ~ a, and b; # b, are not being considered. These 
tables would only occasionally be required in applications. 

We have suggested situations wherein this work will be useful. Examples in 
taste testing, field experimentation, and regression have been worked out in 
detail by Schumann and Bradley [9]. 
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ON CHOOSING AN ESTIMATE OF THE SPECTRAL DENSITY FUNCTION 
OF A STATIONARY TIME SERIES' 


By EMANUEL PARZEN 


Stanford University 


1. Introduction. The problem of estimating the spectral density function of a 
stationary time series has been extensively discussed recently (see references). 
The present period of research may be said to have commenced about 1945, when 
Bartlett and Daniell (see [1]) pointed out that the periodogram needs to be 
smoothed if it is to form a consistent estimate of the spectral density. About 
1948-49, several consistent estimates were proposed by Bartlett [2] and Tukey 
[15]. Later, Grenander [6] and Rosenblatt [9] considered a general class of esti- 
mates of the spectral density. The present writer also considered in {11} and 
[12] a general class of estimates, treating continuous parameter, as well as discrete 
parameter stationary time series. 

In the work of Grenander, Rosenblatt, and ourselves, the mean square error 
E \fr(w) — f(w) |? is adopted as the figure of merit of an estimate f7(w) of the 
spectral density function f(w). In our paper [12], the asymptotic bias, asymptotic 
variance, and asymptotic mean square error are computed for a certain general 
class of estimates. Certain general conclusions are stated as to (1) the highest 
order of consistency with which the spectral density function of a given stationary 
time series, whose covariance function satisfies certain conditions, could be 
estimated, using a suitable sequence of estimates of the form considered, and (2) 
the order of consistency which a given sequence of estimates could achieve for 
any stationary time series satisfying certain conditions. Conclusions of type 
(2) were stated also by Grenander and Rosenblatt [9] for certain estimates whose 
asymptotic bias, and consequently whose mean square error, they were able to 
evaluate. 

In a more recent paper [14], we carry these results a good deal further, and 
show how to construct estimates of the spectral density function which achieve 
a maximum order of consistency and a minimum asymptotic mean square error. 

Nevertheless, these results, as they stand now, cannot be said to constitute a 
practical solution to the problem of estimating the spectral density function 
of a stationary time series. For while many estimates have been proposed, little 
attention has been paid to the question of how to choose among them. Given 
any exponent a, 0 < a < 1, one may construct many estimates fr(w) having the 
property that they are consistent of order T “, in the sense that their mean square 
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errors are such that lim,;.,, T°E | fr(w) — f(w) |* is finite and non-zero. The ques- 
tion naturally arises how to form a most desirable (or optimum) estimate of 
order of consistency T *. In this paper, especially in Sec. 5, we put forth certain 
considerations which indicate that the usual discussions of this question (such 
as in Grenander and Rosenblatt [9], pp. 154-155) are not adequate to settle 
the question. We then put forth certain notions on how to compare two estimates 
of the spectral density to determine which is more desirable. More importantly, 
we indicate a possible method of designing a spectral analysis. 

It is to be emphasized that this paper is open to the criticism that it employs 
relations for samples of finite size which are true only in the limit. For this reason, 
it is to be regarded as an attempt to obtain, on somewhat heuristic grounds, a 
“practical’’ solution to the problem of estimating the spectral density function 
of a stationary time series. The paper contains no theorems. 

This paper is to be read as a sequel to our paper [12], whose results form the 
base of the present paper. We employ the definitions, assumptions, and nota- 
tions of [12], the most important of which will be explained here as they arise. 

We discuss only the case of continuous parameter time series, since this seems 
to us to be the case of greatest physical interest. In our opinion, in considering a 
discrete parameter time series, one should always keep in mind the continuous 
parameter time series from which the discrete parameter one was obtained by 
means of sampling at discrete times. An interesting problem, which is briefly 
discussed in [13] and [14], is the relation which exists between the problems of 
estimating the spectral density of a continuous parameter stationary time series, 
and estimating the spectral density of a discrete parameter time series obtained 
by sampling the continuous parameter one at discrete times. 


2. A class of estimates of the spectral density function. Consider a continuous 
parameter stationary time series x(t), with mean m = E[x(t)|, and integrable 
covariance function 


ae 


(2.1) Rv) = Ely®y(t + »)| = / flu) deo, 


where y(t) = x(t) — m. One calls f(w) the spectral density function of x(t). 
Let x(t) be observed for 0 < ¢ <= T. Let Y7(t) denote the deviations from the 
sample mean defined by 


, 1c 
Y,7(t) = x(t) — = x(t) dt, 0<tsT 
(2.2) r | 

= 0, otherwise. 


Let Rr(v) denote the sample covariance function, defined by 


~T—|v| 
Reo) = [  ¥r¥et +] v|)a, lol ST 
“0 


(2.3) 


0, otherwise 
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Its Fourier transform 


T 
fr(w) = [ e "°Rr(v) dt 
7 


l 
2r 


| T P 
| ite yr 
a i eYo(t) at | 


may be called the sample spectral density function or periodogram. 
We consider estimates of the spectral density function of the form 


T 
(2.5) file) = 2 | e**k(Brv)Rr(v) dv, 
a — 7 


where the function k(u), called a covariance averaging kernel, is even, bounded, © 
square integrable, k(0) = 1, and | u|"”**| k(u) | is bounded in u, for some 
« > 0. The constants By are assumed to tend to 0 as 7 — © in such a way 
that TB; — o~. 

In [12] it is shown that the properties of the estimates fr(w) depend on k(u) 
and By in the following way. The variance o[f7(w)] = E|\fr(w) — Eft(w) |? 


satisfies 
(2.6) lim TBro'|fr(w)] = [ k’(u) du f*(){1 + 80, w)}, 
T+2 — 90 


where 4(w; , #2) = 1 or 0 according as w: = w, or w * w:. The bias b[f7(w)] = 
Eft(w) — f(w) satisfies 


(2.7) lim By'bl fr(w)) = k°F(w) 
T+ 


if r > 0 is such that 


(2.8) L® = lim Lo A 


u +0 | u 
is finite, 
l “x 


(2.9) (wy) = 
2r —20 


e’ |v |'R(v) dv 


exists as an absolutely summable integral, and 


(2.10) 0 < lim inf TB?” < lim sup TBF” < @. 
¥ T 


The function f‘” (w) is to be regarded as a generalized rth derivative of the spectral 
density function f(w). 

If a kernel k(u) has the property that there is a unique positive number r 
such that k“” exists and is non-zero, then r is defined to be the characteristic 
exponent of the kernel k(u), and k is defined to be the characteristic coefficient. 
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3. On achieving a figure of merit with minimum observation time. As a figure 
of merit of the estimate fr(w) one may use its mean square percentage error 
n [fr(w)], defined by (for f(w) > 0), 

7 | * a 12 2 2 \} 
(31) ftw) —< ZLR@) — So) _ ofr) { 142 eet, 


fo) F*(w) o'lfr(w)] 


Another figure of merit one may use is the Gaussian range of percentage error, 
ae 
Al[fr(w)|, defined by 


‘ ‘ oo al fr(w)] f bl fr (w)] | 
(3.2) Alfr()] = vp fo) 1! + y olftla)ih ’ 


where 7, is the p percentile of the normal distribution, defined by the relation 


| 8? dy = Vm (9/2). 

Yp 

It was shown in [13] that the use of the Gaussian range of percentage error leads 
qualitatively to the same conclusions as does the mean square percentage error. 
In order not to overload the present paper, we merely mention the existence of 
the notion of Gaussian range of percentage error, but do not discuss its proper- 
ties, or the motivation for its consideration. 

We now make the crucial simplification on which the discussion of this paper 
is based. We suppose that the relations (2.6) and (2.7) which are valid in the 
limit as 7 -—> © may be written as equations valid for finite values of T. We 
then obtain the following expression for the mean square percentage error (which 
we write only for the case w > 0 in order to drop the term 1 + 6(0, w)): 


[Hefor(w) P_) 


/ k?(u) du }1 4+. TB - a 
- ) flo) (u) dul 


2 ef 
(3.3) nlfr(w)| = ‘ 


Now for a given choice of the covariance averaging kernel k(u), and length 
of observation time 7’, (3.3) defines n’ as a function of B, . Similarly, for fixed 
k(u) and 7’, (3.3) defines T implicitly as a function of B; . One may solve for T 
explicitly, and one obtains 

se 


B; | k(u) du (1 + 6(0, w)) 
(3.4) f « S25 Su : 


f - = ¥ 7 ey 
(ees ( n i 


where we define the quantity A,(w) by 
Sw) |" 
f(w) 
It is clear that A,(w) has the dimensions of bandwidth (i.e., of the reciprocal of 
time). It may be shown, by a consideration of examples, that \,(w) is related to 


(3.5) A-(w) = 
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the notion of bandwidth of a spectrum as it is usually defined in the physical 
literature. Consequently, \,(w) may be interpreted as an extension of the notion 
of bandwidth, and we call A,(w) the” spectral bandwidth of order r at the frequency w. 
Equation (3.4) gives the observation time required in order that the estimate 
fr(w), given by (2.5), with a specified value of the constant B, , have a mean 
square percentage error equal to 7’. We now consider kernels k(u), and r > 0 
such that k“ # 0. One may determine the value B,,;, of B which minimizes the 
observation time 7’, and the value 7',,;, of T at this minimum. One obtains 


; : pS ee 1 . 
1 


(3.7 min > d-( wt a 9 
3 ‘) B (wo) k® \WrC(r) 


where we define, for a kernel k(u) for which k*’ # 0, 


(3.8) T(k) = |k® pf k(u) du 


and where, for r > 0, 
(3.9) C(r) = (1 + 2r)"™. 


One could also give a formula for the minimum mean square percentage error 
nain Obtainable with a fixed observation time 7 (compare Grenander and Rosen- 
blatt [9], pp. 154-155). However, there is no need to write this formula explicitly, 
for a calculation of nmin Shows that it may be obtained from (3.6) by replacing 
T min by T, and 9 by min - 

Equation (3.6) may be used to make a comparison of the effect of using differ- 
ent kernels k(u). Before making this comparison, we introduce some possible 
averaging kernels. 


4. Some possible covariance averaging kernels. There are very large numbers 
of possible functions which one may consider as possible covariance averaging 
kernels k(u) in the formula (2.5) for the estimated spectral density function 


fr(w). 


To begin with, one may consider the following 3-parameter families of 
functions. 


The algebraic family, defined for r > 0, y > 0, and 0 < uw S 1/y: 
(4.1) ka(u;y,u,r)=1— (ul) for |u| Su 
= 0 otherwise. 


The cosine family, defined for r > 0, y > 0, and 0 < uw S x/2y: 
2J. W. Tukey has suggested the alternative name of spectral bandscale, rather than 
spectral bandwidth. 
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r/2 


> 9 | 
(4.2) ke(u; y, wu, r) = + nied for|u|< 


= 0 otherwise. 
The exponential family, defined for r > 0, y > 0, and 0 < 
(4.3) ke(u;y,u,r) =e?" for jul <u 
= 0 otherwise. 


The geometric family, defined for r > 0, y > 0, and0 <u 


I 
ke(u; y, u, 7) 


= ———— for |u| < 
(4.4) 1+ (y| 4)’ A ated sat 


= 0 otherwise. 

The name “the geometric family” is motivated by the fact that the expression 
in (4.4) is the sum of a geometric series. 

By expanding these functions in power series, it is immediately clear that 
each or these kernels has characteristic exponent r, and characteristic coefficient 
k® a 7’. 

Another kernel that should be considered is given by 


(4.5) k,(u) = 1 for |u| <1 
= 0 otherwise 


which can be regarded as the limit of k4(1, 1, r) as r — «. This kernel give rise 
to the “truncated” estimate of the spectral density (see Grenander and Rosen- 
blatt [9], p. 148). 

The estimates for the spectral density which have been suggested by Bartlett 
(see [9], p. 146) and Tukey (see [15], or [10], or [9], p. 149, for similar estimates) 
can be obtained from (2.5) by using respectively the kernels 


(4.6) ke(u) = 1 — |u| for |u| <1 
= @ otherwise 


and 


kr(u) aoe for |u| <1 
(4.7) - 


= 0 otherwise. 
These kernels are the same as k4(u; 1, 1, 1) and ke(u; w/2, 1, 2), respectively. 
Another estimate, which has been suggested by Daniell (see [1], or [3], or 


[9], p. 147, where it is called the rectangular estimate), corresponds to using the 
kernel 


(48) ko(u) = S2* 


. 
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This kerne] may be written as the Fourier transform of k,(u), defined by (4.5), 
in the following way: 


[ eh (a) dea 
(4.9) kp(u) = ——. 


[i hela) eo 


In a similar way, from the families of kernels k, , ke , kg and ke , one may ob- 
tain new families of kernels ki, , kc , kx and kg . One gives only the definition of 
k’, , since the others may be defined similarly: 


[ ek 4(w; ¥, m, 7) do 
(4.10) ki(u;y, 4,7) = — : 


iz k i ‘matin 


To determine the properties of these primed kernels, one considers a general 
kernel h(u), defined by 


[ es dw 
(4.11) h(u) = 


> 


[HG H(w) de 


where H(w) is a function satisfying the conditions 


(4.12) / | H(w) | dw < a, [ H’(w) dw < @. 
Then 


an [ H*(w) deo 


Tawa] 


One determines the characteristic exponent of h(u) under the assumptions that 


(4.13) i h'(u) du = 


(4.14) [ wH(w) dw = 0, [ w H(w) dw < ~. 


By expanding e'““ in Taylor series, it follows immediately that the character- 
istic exponent is r = 2, and the characteristic coefficient is given by 


(4.15) h® = a | 


2 Hw) de | 


Now each of the kernels k, , kc , and kg satisfy (4.12) and (4.14). Therefore, 
the primed kernels k, , kc , and ks have characteristic exponent 2, with char- 
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acteristic coefficient given by (4.15). The kernel kg does not satisfy (4.14) for 
r < 3; therfore the kernels ke , for r < 3, will have a characteristic exponent 
less than 2. 

One next evaluates the coefficient 7'(k), defined by (3.8), for some of the kernels 
that have been introduced. One notes first that T(k) remains unchanged under 
a change of scale; i.e., if two kernels k,(u) and ke(u) are related by the formula 
ki(u) = ke(Bu) for some positive number B, then T(k,;) = T(ke). 

It is noted next that for the kernel h(u) defined by (4.11), if (4.14) holds, 


on? [ Hw) deo ( [ H?(w) do) 
(4.16) T*(h) = pakcengies iain , 


(iz) 


Consequently, for the kernels k‘,(u; 1, 1, r), one obtains 


os sie ot LLU, - tay 
(4.17) T’[ka(u; h. l, r)| = 6 r+ 3) Tr 1 +74 ° 
For the algebraic kernels, one obtains 
2 1 4 
: mi. . ] sae 9 eae r+1 SE aol t y al : 
(4.18) Tika(u; y, », 7)] {(on) ae j (Hy) aor ye (uy) | 
For the exponential kernels, one obtains 


~py(2r)t/r 1 
aly 
| er dt. 


l/r 
(4.19) a ( 4 


2r 


“0 


For the cosine and geometric families of kernels, the results are given only 
for r = 2: 


(4.20) Tlkc(u; y, u, 2)| = uy + 3 sin Quy + ze sin 4y7, 


91) mM]. J a KY -1 
(4.21) Tlke(u; y, u, 2)) = 1+ Gr + tan” (u7). 

5. On choosing a covariance averaging kernel. In this section, we argue the 
major proposition with which this paper is concerned, namely, that the notions 
of order of consistency and of asymptotic mean square error do not by themselves 
provide a basis for choosing between competing estimates of the spectral density. 
This is a negative statement. We also consider the positive question of how to 
make the choice. 

In order to put this proposition in its clearest light, we shall consider the 
effect of using covariance averaging kernels of different functional forms but with 
the same characteristic exponent r. To compare two kernels of the same character- 
istic exponent r, we see from (3.6) that it is only necessary to compare Tk), 
since the minimum observation time 7',,i. is directly proportional to T(k), for 
fixed r and 7’. 
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° 
° 5 is 2.0 


Fic. 1. The coefficient T'(k) defined by (3.8) plotted, as a function of uy for the algebraic, 
cosine, exponential, and geometric families of kernels of characteristic exponent r = 2, denoted 
respectively by k4(u; y, mu, 2), ko (u; vy, m, 2), ke (u; y, u, 2), and kg(u; y, uw, 2) 


In Fig. 1, we plot 7'(k(u; y, uw, r)) for the algebraic, cosine, exponential, and 
geometric families with characteristic exponent 2. It turns out that T(k) is a 
function only of the product of the parameters wu and y. Further, T(k) may be 
made as small as we please by choosing zy sufficiently small. This fact cannot be 
correct, of course, and emphasizes that (3.6) is not. valid for wy close to 0, which 
is not a case where 7’ is large and B, is small. 

However, even if we confine our attention to T(k) for values of wy > 3, say, 
the graphs in Fig. | are still disquieting. They seem to imply that the functional 
form of the kernel k(x) is not too important, since if ko(u; yo , wo , 2) is a kernel 
belonging to the family of kernels ko(u; y, uw, 2), then given any other family of 
xernels k,(u; y, u, 2), there will exist a choice of parameter values y; and y; such 
that the coefficients T(k) corresponding to the kernels ki(u; 7, w,, 2) and 
ko(u; vo , wo , 2) are equal. Thus there appears to be a need for a principle which 
would provide a means of choosing the parameters » and y. Such a principle 
may provide a means of choosing between kernels of different functional form. 

One such principle may be obtained as follows. An estimate fr(w) of the form 
of (2.5) may achieve a small mean square error at the price of averaging over a 
large band of frequencies. This fact is important if we desire to estimate the 
difference f(w2) — f(w:) between the values of the spectral density at two neighbor- 
ing frequencies w and w. As a measure of the ability of the estimate ft(w) to 
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estimate such differences we compute the mean square percentage error 


(51) WDaft(w)] = ZL + aBr) ~ f rel) Mle + eB) = fle 


of the increment Dafr(w) = fr(w + dBr) — fr(w), where d > 0 is fixed. One 
may show that 


(5.2) lim TByx'[Daft(w)] = 2 [ (1 — cos du)k*(u) du {1 + (0, «)}. 
T+o — ao 


If one regards (5.2) as holding approximately for large values of 7 and small 
values of Bry one has, for w ¥ 0, 
(5.3) WDaft(w)] = zr 2 [ (1 — cos du)k*(u) du. 

TBr Le 
If one takes T = Trin and Br = Buin , given by (3.6) and (3.7), respectively, 
then 


(54) a Deftau(w)] = —" 2 LS eaters 


(1 + x) [ k*(u) du 
2r ep 
The mean square percentage error in (5.4) is of the estimate of the increment 
f(w + dBuin) — f(w). If one desires the mean square percentage error of the 
estimate Agfr(w) = fr(w + 8) — fr(w) of the increment Agf(w) = f(w + B) — f(w), 
one has, by setting d = 8/Bmin , that 


2 * a Qn” o fh 
1 [Asfr,,in(@)] = ———~ Spr(k) 


| 
(1+) 


where we define 


[ {1 — cos (p’ | k” ~ u) }k*(u) du 


(5.6) S»(k) = —*— aes sis 
/ k’(u) du 
and @’ is defined as a function of 8 by 


»_ B C&) 
(5.7) p’ = ie) a 


Next, consider a kernel k(u) defined by 
(5.8) k(u) = h(yu), ju| Su 
= (), otherwise 


in terms of a kernel h(u) of characteristic exponent r and characteristic coefficient 
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h = 1. For k(u) defined by (5.8) we obtain from (3.8), (2.8), and (5.4) 


(5.9) T(k) =2 I h?(u) du, 
0 


(5.10) | R” pf =¥ 

BY ‘ 

(1 — cos p’u)h'(u) du 
(5.11) Sa: (k) ee ‘ ane seatiay 


BY oe 
[ h(u) du 


We see from (5.9) and (5.11) that the properties of the kernel k(u) depend 
only on the product wy. We consequently take y = 1. Next, let us consider how 
to choose yu. One sees that, as up — 0, T(k) — 0 in (5.9) and Sg-(k) — 1 im (5.11). 
Now 7(k) is proportional to the minimum observation time which can be at- 
tained using the kernel corresponding to », while Sj(k) is proportional to the 
mean square error of the increment Agf(w), using the kernel corresponding to uz. 
Thus » must be chosen so as to strike a balance among these quantities. 

Our criterion for choosing u will follow from the following assumption on how 
to design a spectral analysis. To our mind, in order to design a spectral analysis, 
one must first specify a quantity ’, which one desires the mean square percentage 
error of one’s estimate of the spectral density not to exceed. One next specifies 
quantities 8 and 7} , such that one desires to estimate the increment f(w + 8) — 
f(w) with a mean square percentage error less than or equal to 9; . One finally 
assumes that the spectral bandwidth, of order r at the frequency w, of the spectral 
density function being estimated, is greater than or equal to a known quantity 
A-(w). Given a kernel k(u) of functional form (5.8), with y = 1, one chooses yu 
so that 7°[Agfr,,;,(«)], given by (5.5), is S 9;. One then lets T = Tin, given 
by (3.6), and Br = Buin , given by (3.7). The estimate f7(w), given by (2.5), is 
then completely defined. It will have the desired properties. 

Finally, we may compare the properties of two families of estimates hi(uu) 
and he(uu). To each family h(uu) by the foregoing procedure one obtains a mini- 
mum observation time Thin . One chooses that family of kernels which leads to 
the smaller minimum observation time. 

The foregoing procedure can be made routine if suitable tables and graphs 
are constructed. However, we have not made such computations. Before such a 
computational effort is made, it seems to us that the methods proposed for 
choosing an estimate of the spectral density function and for designing a spectral 
analysis, which after all are somewhat heuristic, should receive some public 
acceptance. 
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BOUNDS FOR THE VARIANCE OF THE MANN-WHITNEY STATISTIC 


By Z. W. Brrnpaum AND Orvat M. Kutose! 
University of Washington 


1. Summary. Let X, Y be independent random variables with continuous 
cumulative probability functions and let 


p = Pr{¥ < X}. 


For the variance of the Mann-Whitney statistic U, upper and lower bounds are 
obtained in terms of p, for the case of any X and Y as well as for the case of 
stochastically comparable X, Y. The results for the case of stochastic compara- 
bility are new, while the inequalities in the case of arbitrary X, Y have either 
been obtained by van Dantzig or are a consequence of other inequalities due to 
van Dantzig. 


2. Introduction and statement of results. Let X and Y be independent random 
variables with the continuous cumulative probability distribution functions 
(c.d.f.’s) F(x) and G(y), respectively, and let X,, X2,---, X, and 
Y,,¥2 Y,, be samples of these random variables. We consider the statistic 


(2.1 U = number of pairs (X;, Y;) such that Y; < X,, 


introduced by Wilcoxon |1] for m = n and by Mann and Whitney [2] in the gen- 
eral case. 

To simplify arguments we shall from now on assume that F(t) and G(¢) are 
both strictly increasing functions, although it can be easily seen that al] con- 
clusions remain valid without this restriction. The function 


2.2) L(t) = FIG (0), 


which will be called the “relative distribution function of X and Y,” is a con- 
venient means of reducing many problems involving two probability distributions 
to a study of a cumulative probability function on the unit interval. One verifies 
easily that X and Y have the same distribution if and only if L(t) = ¢ for 
0 < ¢ S 1. Similarly X is stochastically smaller than Y, that is, F(s) = G(s) 
for —~ < s < + if and only if L(t) = tfor0 < t < 1. 

Using the quantity 


+e 1 
23) a=Pr{¥ <X} « / G(s) dF(s) = [ t dL\(t) 
—@0 0 


and the relative distribution function Z, one can rewrite expressions for the 
expectation and the variance of U obtained by van Dantzig [4] in the form 


(2.4.1) E(U) = mnp, 
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(2.4.2) a (U) = mn{(m — 1)¢e’ + (n — 1) + pil - 


where 
+2 +e 
F’ dG — if F aa) = | F’ dG — (1 — p)’ = o |F(Y)] 


= [ L(t) dt — (1 — p)’, 


+00 +00 2 +90 Z 
_— [ G dF — (/ G a) / G dF — p = o[G(X)] 


1 


| ? dL(t) — p. 


(2.4.4) 


In Sec. 3, inequalities involving ¢” and 7’ will be derived which will be used to 
obtain Theorem 3.2 on the sharp upper bound 


(2.5) o (U) S mnp(1 — p) max (m, n) 


and Theorem 3.5 on the sharp lower bound 


3 6@ =) pl 
wo | wrt r) oa | 7 


— 1 
if ——— ; 
yv— 1 


( 
| 
(26) o(U)= ltr Me —Do— Dr—-@wtr—2r’+r(l — 7) 
| 
\ 


where » = min (m, n), v = max (m, n), r = min (p, 1 — p). The upper bound 
(2.5) has been obtained by van Dantzig [4] and is discussed here only for the 
sake of completeness and convenient reference. While it is believed that (2.6) 
has not been stated elsewhere, the inequalities involving ¢’ and 7’ on which 
it is based are essentially modifications of analogous inequalities obtained by 
van Dartzig (5). 

In Sec. 4 similar inequalities for ¢’ and y’ are obtained which yield Theorem 
4.2 on the sharp upper bound 


o(U) S wvf{r[4(1 — (1 — 2p)*”) — py 
(2.7) + wl[— 3(1 — (1 — 2p)**) + 2p — J 


+ 431 — (1 — 2p)*”] — p(l — p)}, 
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and Theorem 4.5 on the sharp lower bound 
mn{43[m + n + 1 + 20/(m — 1)(m — n)(1 — 2p) 


— [m(1 — p)* + np’ + pr(l — p))} if a" < 2p, 


mn\$pr/2(m — 1)(n — l)p — (m+n — 2)p' + p( — p)} 


n—l1 


: 1 
| if 2p <— i = 2p 


|mn{4[m +n + 1 + 20/(n — I(n — m)(1 — 2p)*} 





2 9 — n— 1 
— [mp + n(l — p) + p(l — p))} if > Ss am 1? 
under the assumption that X is stochastically smaller than Y, that is, F(s) = 
G(s), —2 < s < +o. These results are new. Due to the imposition of the 
stochastic comparability condition the bounds (2.7) and (2.8) of course are 
better than the bounds (2.5) and (2.6) for the general case. 

Upper bounds such as (2.5) and (2.7) are useful in problems of estimating the 
parameter p by p = U / mn (see [3]). Lower bounds are needed in obtaining in- 
equalities for the power of the Mann-Whitney test such as those given in [4]. 


3. Inequalities for ¢’, -;*, o?(U) in the general case. 
3.1. Lemma. With the notations of the preceding section we have 


pl 
(3.1) | (L(t) — dt < 4 — pl — p), 
“0 


and equality holds for 


forO < 
(3.1.1) 
for p 


and for 
(3.1.2) L(t) = 1 — pfor0 
Proor. In the identity 


1 
/ 


i [ (L(t) — L(s)] ds dt = | (t — 1)L@ a 


the integrand on the left side satisfies 0 < L(t) — L(s) S$ lforO Ss SiS 1, 
hence L(t) — L(s) = (L(t) — L(s)}, and 


| " (2t — I)L( at = [[ (L(t) — L(s)}* ds dt = [ L() dt — If Lit) au]. 








936 Z. W. BIRNBAUM AND ORVAL M. KLOSE 





Therefore 


pl . 1 = 1 
[ wo -aa=[ roa-2f wHa+y 


< If Lip) ar| -| L(t) dt ++ 4 = 4 — p(l — p). 


One verifies by direct computation that L,(¢) and L,(#) yield equality in (3.1). 
3.2. THEOREM. The variance of U has the upper bound 


(3.2) a (U) S mnp(1 — p) max (m, n). 
Equality holds for L, if n = m, and for L, if m = n. 


Proor. We use the equality 
1 
éty=[ Lo -@a+i-e- 1 -p 
and, ifn = m, write (2.4.2) in the form 
o(U) = mn{(m — 1) + 7’) + (n — m)y’ + p(l — p)} 


(3.2.1) ” mn {(m we » | [L() — ff? dt + (n — my 


+ (m — [3 — p — (1 — p)'} + pl - y}. 


/ 


Noting that 


lA 


~1 1 
(3.2.2) 7 = | tdlL(t)-—p s l t dL(t) — p’ = p(l — p) 


and making use of (3.1) we obtain (3.2). Since equality holds for Z,(¢) in (3.1) 
and in (3.2.2), the upper bound is attained in (3.2) for L,(t), if n 2 m. The case 
m = n follows by a symmetrical argument. 

3.3 Lemma. Let F, and F;, be strictly increasing continuous ¢.d.f.’s with 


+20 +e 
(3.3.1) F, dFz = wn, [ F, dF, = pe, 
hence 
(3.3.2) P+ pe = 1; 
and let 
9 — 9 9 9 ins 9 
(3.3.4) ¢1 = | Fy dF, _ Pi; ¢2 = | F3 aF; - pa. 


Then, for any uw, 2 0, we = 0, uw: + we > 0, we have 


(3.3.5) mei + wees = bw E: ~ |, 
12u, 
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for u = 1,» = 2, as well as for u = 2,0 = 1. Inequality (3.3.5) can not be im- 
proved if 


(3.3.6) P< min (2p, 2p»). 
Me 
Proor. Writing 


Fi(s) = t, FAFS?O) = LO, 


we have 
1 1 
Pp. = I t dL(t), Pr. = | L(t) dt, 
0 “0 


1 1 
e. = [ Pdl(t)—p., o = | L*(t) dt — pr. 
0 0 
For any real a, 8, 


1 2 
os | L®@—a-psPdt=e+pR4+54+68--a 
(3.3.6.1) I 3 
+ ale. + pr] — 28p. + af, 


and 


2 
a —% — apt — pi — (+ a8 — 26p,) 


20+ Ap — (a+ pi - 5 (8+$). 


For fixed a, the right-hand expression is maximum at 8 = p, — (a/2), so that 


oe + ay, = a (np, - $). 
Setting a = (u./u.), we obtain (3.3.5). Equality holds if and only if L(t) = 
at + Bfor0 < t < 1, witha = (uu/p,) and 8 = p, — (a/2), that is, for 


? 


“by 


(3.3.7) L,(t) = rs tte, -—, O<#<1, 


and this is a c.d.f. if and only if L(O) 2 0, L(1) S 1, which is equivalent with 
(3.3.6). 
3.4. Lemma. Under the assumptions 


(3.4.1) 


(3.4.2) 
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we have 
(m — 1g’ + (n — ly’ = (m — 1)(1 — 2p) 
(3.4.3) + $pV2(m — 1)(n — Dp — [(m — 1)(1 — p)* + (mn — 1p) 
= $pv/2(m — 1)(n — 1p — (m+ n — 2)p’, 
and this inequality can not be improved. 
Proor. For any a > 0,0 S 8 S 1,a+ 8 2 1, we haveO S .-8 +s. 


= _ a 
and 


1 ‘ 1 a 
I [L(t) — at — pf dt = | [L(t) — at — By dt 
0 (1—8)/a@ 


(3.4.4) 


> (at + 6 — 1)*dt = 
1—8)/a 3a 


[ (a+ 8-1)" 


From this and 


" 2 
I [L(t) — at - py dt = ¢ + ay + (1 — p)’ + op +3 


(3.4.4.1) 
- a+ 6 + a6 — 28(1 — p) 
follows 
(a + 8-1) 
3a 


For fixed p and a, the right side is maximum for 8 = 1 — +/2ap. This value 
satisfies the conditions 0 < 8 < 1, a + 8 2 1, if and only if 


+ ay 2 a — 5 — ap’ — (1 — p) — a8 — 8 + (1 — p) + 


(3.4.5) 2psas 


2) 
and then we obtain 
(3.4.6) ¢ t+ ay 21 -— 2p + tp V2ap — lap’ + (1 — p)’). 


If m 2 n, then (3.4.2) becomes [(n — 1)/(m — 1)] = 2p, so that 


a = [(n — 1)/(m — 1) satisfies (3.4.5), and for this value of a inequality 
(3.4.6) yields (3.4.3). 

If m < n, then (3.4.2) becomes [(m — 1)/(n — 1)] > 2p, the value 
a = [(n — 1)/(m — 1)] again satisfies (3.4.5) and we obtain (3.4.3) from 
(3.4.6). 


Equality holds in (3.4.4) if and only if 


lat+ 8 for 
L(t) = : 


for 





MANN-WHITNEY STATISTIC 


so that equality is attained in (3.4.3) for 


in — 1 — 1 
ma tet 4/22tp, <i 
L(t) =< 


3.5. THrorem. Under the assumption 
(3.5.1) ps3 
and with the notations 
(3.5.2) wu = min (m,n), vy = max (m,n), 
we have 
(353) o(U)2 wf wi —p)- a if — 


o(U) = witpvV/2u — Do — Dp — (ue +r — 2p + 
(3.5.4) 


and these inequalities can not be improved. 

Proor. Assumption (3.5.1) constitutes no loss of generality since, in case it is 
not satisfied for p defined by (2.3), it will be satisfied if F and G are interchanged. 
Using the notations (3.5.2) and setting in Lemma 3.3: p: = p, po = 1 — p, 
m=m—-lw=n—-le=y,¢e:=¢,u = #— lu =v — 1, we obtain 
(3.5.3) from (3.3.5) and (2.4.2). Inequality (3.5.4) follows immediately from 
Lemma 3.4 and (2.4.2). 


4. Inequalities for the case of X and Y stochastically comparable. Throughout 
this section X will be assumed stochastically smaller, that is F(s) 2 G(s) or, 
in terms of the relative c.d_f. 


(4.0.1) ts Lid), for0 SiS 1. 
According to (2.3) this implies 
(4.0.2) 


We introduce the abbreviations 


(4.0.3) A(L) = [WL = ef a 


(4.0.4) BL) = | Od =e +0 - py, 


1 1 
(4.0.5) o(L) = [ Par 1-2 Lod=y +r 
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4.1. Lemma. Let p S } be given and let L(t) = t be such that fi L(t) dt = 1 — p. 
Consider the family of functions 
t, Ost<r 
(4.1.1) L@=r+VJ1l—2p, rst<r+wVl1— 2p 
t, r+VYil—2sisi 
defined forO Sr S1- V1 — 2p. For these functions we have 


1 
(4.1.2) [ Loa=1->, Osrs1-V1—- Q, 

0 
(4.1.3.) A(L) S$ A(L,) = 3(1— 2p)’, OS rS1-—~VJ1— 2p, 
(4.1.4) 2(1 — 2p)*? + 4 = Bie) S B(L) S B(Li_vi-i) 


1 — 2p +4 — 4(1 — 2p)”, 
(4.1.5) 2p — 3 + 2(1 — 2p)? = C(Livias) S C(L) S C(Lh) 
= 4—4(1 — 2p)’. 


Proor. Since a continuous L(t) 2 ¢ can be uniformly approximated by a 
“‘saw-tooth” function, i.e., by a relative c.d.f. whose graph consists of a finite 
number of line-segments, either horizontal or on the line / (see Fig. 1), it will be 
sufficient to carry out the proof for such functions only. 

Let us first consider an “‘isolated’’ tooth, such as K in Fig. 1, and translate 
it by A > 0 to position K’, thereby replacing L by L*, say. It is clear that 


1 1 
1» - f Li) dt = [ Ld, A(L) = ACL), 


0 


B(L*) > B(L),  C(L*) < C(L). 


Translating each isolated tooth as far as possible to the right we obtain a saw- 
tooth function L** for which all teeth are adjacent and the last to the right ends 
with a horizontal line-segment with ordinate 1 (such as all teeth in Fig. 1, except 
K), and for which 


1 1 
J L**(t) dt = [ L(t)dt=1-—p, A(L**) = A(L), 
0 6 


B(L**) > B(L), C(L**) < C(L). 


Now consider a pair of adjacent teeth, such as M and N in Fig. 1. If the 
vertices M, N of these teeth have the coordinates (t; , u:), (tue), we replace them 
by one tooth with the vertex P(t}, u2) where 


t; =Uw-— V/ (us — u;)? — te a ti)(u2 — u,). 
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P(tiad, Ned 


M(ta) | 


Fie. 1 


Again it is clear that for the resulting L*** we have fj L***(t) dt = fo L**(¢) dt, 
and one verifies by direct computation that the contribution of the interval 
(t; , Ue) to the integrals A, and B increases as L** is replaced by L***, while the 
corresponding contribution to the integral C decreases. After a finite number of 
such steps, each of which merges a tooth with its neighbor to the right, we 
obtain L,_,/i-%> , which proves the inequalities involving L;_,/7—35 in (4.1.4) and 
(4.1.5), while (4.1.3) follows from the observation that A(L,) takes the same 
value for each value of r as for the value r = 1 — +/1 — 2p. The inequalities 
involving Le are obtained by an analogous argument in which first all isolated 
saw-teeth are translated to the left as far as possible and then each tooth is, in 
succession, merged with its neighbor to the left. 

4.2. Tuzorem. For p given and any relative c.d.f. L(t) = t with fi L(t) dt = 
1 — p (implying p S 4), the variance of U has the upper bound 


o'(U) S pr{r[(1 — (1 — 2p)**) — p*] + wl — 3(1 — (1 — 2p)*”) 


(4.2) : a 
+ 2p — p] + {1 — (1 — 2p)”’] — pl — p)}. 
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Equality holds for L = Lo ifn = mand for L = Iy_,7-3 ifn S m. 
Proor. If n = m, we write (3.2.1) in the form 
a (U) = mn{(m — 1)A(L) + (n — m)C(L) — (n — m)p* 
+ (m — 1)1§ — p — (1 — p)'] + pl — p)}. 


Setting L = Lp» in the right side we obtain the theorem from Lemma 4.1. A 


symmetrical argument, stressing B(L) instead of C(L) completes the proof for 
nism. 
4.3 Lema. Under the assumptions (4.0.1) and 


(4.3.1) 
we have 
(m — le’ + (n — 1) 
(4.3.2) > {m+n — 2+ 2A(m — 1)(m — n)(1 — 2p)*}'} 
— [(m — 1)(1 — p)’ + (n — 1)p’} 
and this inequality can not be improved. 


Proor. If0 S a < land0O S 8 S 1 — athenO S [8/(1 — a)] S 1 and in 
view of (4.1) we have 


1 1 
I [L(t) — ot — By’ at = | [L(t) — at — pl dt 
0 / (1a) 


(4.3.3) 


; ; (1 — a — g)* 
mn fT ate Ss 
£ t * am 3(1 — a) 


From this and (3.4.4.1) follows 

2 3 

S+ovea- <—ep'— (i-p) +900 —») - 8 -# +9 28—8 

3 3(1 — a) 
For fixed p and a, the right side is maximum for @ = +/(1 — a)(1 — 2p), and 
this value satisfies the condition 0 S 8 S 1 — a if and only if a S 2p. Conse- 
quently, for a S 2p, we have 

¢ + ay = H{1+ a + 21 — a)(1 — 2p)*} — lap’ + (1 — p)’). 

Setting a = [(n — 1)/(m.— 1)] which is S2p by (4.3.1) we obtain (4.3.2). 
Equality in (4.3.3) is attained if and only if 


lat+ 8 for 0<t< 
Lit) = ¢ 


ie 
it for i 
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so that, with 8 = +/(1 — a)(1 — 2p),a = [(n — 1)/(m — 1)], we obtain the 


function 





for 6<é4¢ 


for 0 
4.4. Lemma. Under the assumptions (4.0.1) and 
m— 1 
< 9 
(4.4.1) oe 2p, 
we have 
(m — lg’ + (n — 1) 
(4.4.2) > 4{m + n — 2 + 2(n — 1)(n — m)(1 — 2p)*}} 
— [(m — 1)p’ + (n — 1)(1 — py}, 

and this inequality can not be improved. 


Proor. If a 2 land0O S — 6 S a — 1, then 


0s 8/1l-a)s(1-8)/asl 


and in view of (4.0.1) we have 
8 


1 < 7 
[ wo - at - af az | 


+ foe [7 o-at— ota 


(4.4.3) 
if #@ | @+6- 1) 
+ fog (ot +8 - VP a alco + a 


— @& 

From this and (3.4.4.1) follows 
¢ tay 2 — > — ap’ - (1 — p) — a8 — 6 + 28(1 — p) 
3 c 3 
tj ef eee ey 
3Ll—ea a@ 

For fixed a, p, the right side is maximum for 8 = 1 — a+ ~a(a — 1)(1 — 2p) 
and this satisfies the condition 0 S —8 S a — 1 if and only if (1/a) S 2p. 
It follows that for (1/a) S 2p, 


e+ ay = 3{1+ a+ ala — 1)(1 — 2p)*}} — [a(1 — p)? + pl, 
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and for a = [(n — 1)/(m — 1)], this inequality yields (4.4.2). Equality in 
(4.4.3) holds if and only if 


t for o<%s 


L{t) = at + B for B -<t 


l—a 


1 for Lan <0x 
a 


so that for a = [(n — 1)/(m — 1)],8 = 1 — a + Vala — I)(1 — 2p), we 
obtain the relative distribution function 


t for 6 <tsa. 


Ld) = —— 


m— 


1 m—n l 
at cea ee 
l + m— 1 m— | 


/(n — l)(n — m)(1 — 2p) 


where 


n-1 /n —™m 
h=1- 1—2p), t&=1-—4/ —“(1 — 2p). 
’ yf P Ce I 
4.5. THrorem. Under the assumplions of Theorem 4.2, the variance of 
the lower bounds 


a(U) = mn{ifm+tn+i+ 20/(m — 1)\(m — n)\(i — 2p)*| 





(4.5.1) 


. ‘ is — | 
— [m(l — p) + np + p(l — p)}} if a 7 3 
_ 
o(U) = mn{$pv/2p(m — 1)(n — 1) — (m+n — 2)p + pil 
(4.5.2) = 
if 2p < Sat = 
m— | 
o(U) = mn{t[m +n +1 4+ 2 (n — 1)(n — m)(1 — 2p) 
(4.5.3) 2 (1 y2 a I if l < = l 
—\|mp + nl — p) + pal — Di} 1 =n 
These lower bounds can not be improved. 

Proor. Inequality (4.5.1) follows from (2.4.2) and Lemma 4.3 with equality 
attained for L;(t), and (4.5.3) follows from (2.4.2) and Lemma 4.4 with equality 
holding for L(t). Inequality (4.5.2) is the same as (3.5.4) which was proven for 
general relative c.d.f. L(t), without assuming (4.0.1) and which holds whether 
m = norm > nsince the right-hand side is symmetric in m, n. The lower bound 
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(4.5.2) cannot be improved even under assumption (4.0.1) of stochastic compara- 
bility, for L4(t) yields equality and satisfies (4.0.1). 
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EXACT MARKOV PROBABILITIES FROM ORIENTED LINEAR GRAPHS 


By Rerep Dawson anp I. J. Goop 


Silver Spring, Maryland and Cheltenham, England 


0. Summary. Using a theorem due to de Bruijn, van Aardenne-Ehrenfest, 
C. A. B. Smith and Tutte concerning the number of circuits in oriented linear 
graphs, an expression is found for the probability of a specified frequency count 
of m-tuples in a circular sequence where the n-tuple (n < m) count is given. 
The corresponding result for linear sequences can be deduced—see [14]. The 
result is valid for stationary Markovity of any order up to and including the 
(n — 1)-st. A method of deriving asymptotic distributions is indicated, and a 
few additional observations made concerning the distribution of pairs in a cir- 
cular array. 


1. Introduction. In studying runs, W. L. Stevens [10] considered the distribu- 
tion of pairs of successive digits when n zeros and N — n ones are randomly 
permuted about an oriented circle. He found the probability of A occurrences of 
0 followed by 0, B occurrences of 01, C of 10 and D of 11, for any A, B, C, D 
subject to 


A+B=A4+tCz=n and C+D=B+D=N—n. 


(Stevens’ result has been generalized by Mood [9] and Whittle [13].) By using a 
combinatorial theorem due to de Bruijn, van Aardenne-Ehrenfest, C. A. B. 
Smith and Tutte (hereinafter known by initials as the BEST theorem), Stevens’ 
result can be generalized to an expression for the probability of a specified m- 
tuple frequency count in a (circular) sequence of given n-tuple count, where the 
alphabet may be of any finite size. The BEST theorem was first stated, im- 
plicitly, as a ‘‘note added in proof” on page 217 of de Bruijn and Ehrenfest [2], 
and is largely based on Tutte [11] and Tutte and Smith [12]. 


2. The BEST theorem. Given any u X wu matrix of nonnegative integers there 
corresponds an oriented linear graph, with vertices 1, 2, ... , u, such that the 
number of oriented paths (edges) leading from vertex r to vertex s equals the 
matrix element in row r and column s. The matrix, unique to within the same 
rearrangement of rows as of columns, is called the “incidence matrix” of the 
corresponding oriented linear graph. The graph is “simple” (in the sense of 
Tutte [11] or a “T-graph” in the notation of de Bruijn and Ehrenfest [2]) if the 
number of edges leading into each vertex equals the number leading out, or, in 
terms of the incidence matrix, if each row has the same sum as the corresponding 
column. A (complete) circuit in such a graph is defined as a unicursal path passing 
exactly once through each edge (in the right direction). The BEST theorem 
gives the number of distinct circuits when all edges are regarded as distinguish- 
able. 
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Let M = |m,,] be the incidence matrix of some simple oriented linear graph 
and let 


mn, = > mi; = Do mi; 
7 2 


be the sum of the ith row (and of thezth column). Let M’ = [m; j| be the wu’ X wu’ 
matrix formed from M by deleting every row and column consisting wholly of 
zeros (in effect eliminating vertices lying on no edges). Then 


/ , , / 
> mi; = > m}; =m;, say, where m; > 0, 
j j 


Let M* = [m; 8;;] — M’, i.e., the matrix of entries my = — mi; fori # j and 
m*; = m; — m;;. Since M* is a square matrix with each row and column sum- 
ming to zero, the cofactors of its elements are all equal; let || M* || be the com- 
mon value of these cofactors. Then the BEST theorem asserts that the number 
of circuits, C(M), in a simple oriented linear graph with incidence matrix M is 


(1) C(M) = || M*\| - [] (m; — 1)!. 
t=] 

3. Distribution of pairs. In applying graph theory to circular arrangements of 
letters the several occurrences of one letter will be regarded as distinguishable, 
so that there will be (NV — 1)! possible circular sequences having a given fre- 
quency count of letters; for the present these sequences will be assumed equally 
probable. Let the frequencies of the individual letters be f,, fe, --- , f: with 
> fi = N; the probability P(F) that pairs of successive letters (i, 7) will have 
the matrix of frequencies F = [f;;], where 


DX fi = fi, 2d fi = fi, fi = N, 


may be determined as follows. Imagine an oriented linear graph consisting of one 
vertex for each of the ¢ letters in the alphabet, together with f;; distinguishable 
oriented edges from vertex i to vertex j (i, 7 = 1, --- , £); then the number of 
circuits is C(F) (Eq. (1)). Although each circuit corresponds to a circular se- 
quence of letters with the pair-frequencies /, the enumeration of the circular 
sequences requires distinguishing the f; uses of the vertex 7 and then identifying 
the f;; edges leading from vertex 7 to vertex 7. Hence the total number of circular 
sequences with the pair-frequencies F is 


1s/I1ss)ecP), 
and the probability of F is 
(2) P(F) = Be hy ls C(P). 
(N -* 1)! Il fs ! 


(See [14] for a proof that Whittle’s formula (8) in [13] is essentially equivalent to 
ours and can therefore also be derived from the BEST theorem. We independently 
noticed this fact, but would not have done so had Goodman not first drawn our 
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attention to the existence of Whittle’s paper. We had prepared a second ap- 
pendix, in which Whittle’s formula was deduced from the BEST theorem, but 
decided not to include it so as to avoid overlap with [14].) 


4. Extension to n-tuples. The general formula for the number of circular 
sequences of N given letters having prescribed n-tuple frequencies is obtained by 
considering the oriented linear g:aph with vertices corresponding to (n — 1)- 
tuples and edges corresponding to n-tuples. Let f;,...;, edges (in accordance with 
the prescription) run from the vertex (i; --+ t,1) to the vertex (i, --- 7,). By 
the BEST theorem the number of circuits in this graph is C(F) where F = 
|fi,-.-%,] is the incidence matrix of the graph. (For an example of this notation, 
combined with the asterisk and cofactor notation of Sec. 2, see the Appendix.) 
Each circuit corresponds to]] f;! circular sequences with the correct n-tuple 
frequencies except that the /;,...;, n-tuples (7...7,) are given a separate identity. 
Hence the total number of circular permutations (of N given letters) having 
prescribed n-tuple frequencies is 


(3) C(F)-[[si/I] Ui...) ifn > 1, or (N — 1)}, ifn = 1. 


If all (NV — 1)! cireular sequences are equally likely, the probability of specified 
m-tuple frequencies, given the n-tuple frequencies (n < m), is simply the ratio of 
the corresponding numbers of circular sequences satisfying the requirement, viz., 


COS ag :s:cadd- Th feg---e.YCCSeg:-4 DT Sig---ta)h ifn > 1; 


(4) 
C(in---inl) TL S/N — DIT] fi---in!, ifn = 1. 


It should be noted that the m-tuple frequencies imply unique n-tuple frequencies; 
for any other given n-tuple frequencies the probabilities (4) must be replaced by 
zero. The logarithm of the ratio (4) may be compared with the statistic 7L,. — 
VL, in paragraph 8 of Good [4]. 


5. Linear sequences. In most applications, such as the analysis of Markov 
processes, the sequence is linear rather than circular; but the circular model is 
mathematically simpler. For example, in a linear sequence it is not always true 
that the n-tuple frequencies of the linear sequence determine the (n — 1)-tuple 
frequencies; nor do the n-tuple frequencies necessarily determine the n-tuple 
frequencies of the circular sequence obtained by regarding the first letter of the 
linear sequence as the successor of the last letter. The linear sequences ABCAB 
and BCABC share the same triples (ABC, BCA, CAB) but differ in pair fre- 
quencies, single-letter frequencies, and in the triple frequencies of the correspond- 
ing circular sequences. [However, the n-tuple frequencies of a linear sequence do 
determine the (n — 1)-tuple frequencies unless 


(L) Rcilesrhai es = Ta tieneta 


for all 7;,--- , 7,3. If condition (ZL) is not satisfied, then the first and last 
(n — 1)-tuples (and hence the complete (n — 1)-tuple frequencies as well as the 
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circular n-tuple frequencies) can be determined.| One way of treating linear se- 
quences (for another see Whittle [13}) is to regard a linear sequence of N letters as 
consisting of N + 1 characters, the new character being a blank placed at the 
end. Then the new n-tuple frequencies (including the n-tuple ending with the 
blank) will determine uniquely the n-tuple frequencies of the corresponding 
circular sequence; and, conversely, the n-tuple frequencies in any circular 
sequence containing a blank will determine uniquely the n-tuple frequencies of 
the corresponding linear sequence formed by cutting the circular sequence right 
after the blank. Hence, in a linear sequence ending with a blank, we may define 
the probability of specified m-tuple frequencies, given the n-tuple frequencies, as 
the value found by circularizing the sequence (retaining the blank) and applying 
formula (4). 


6. Negligible Markovity. The probability P(aja, --- ay) of a specified linear 
sequence @; , dz, +-- , ay of N distinguishable letters under a Markov process of 
order n — 1 or less is 


II fi !P(a,- . On) P(Gn4a 2° *y)° . - Play On—ni1° * *An-1) 


~ -d,)P(as- **Anai)** > P(Gyeesi° ° -Ay) 


2*° -a,)P(az- "An 4)° . »P(dyenst* e *An—) 
(to be interpreted as zero if any factor in the numerator is zero), or 


Is! TL, Pliy- + <i)!" 


(5) P(a,- + -ay) = - 


TT Plie- iy 


iye**tn-t 


where 0° is interpreted as 1. Now if the n-tuple probabilities of the alphabet and 
the n-tuple frequencies of the augmented sequence a, ,--- , ay, b (where 6 is 
the terminal blank) are given, then the (n — 1)-tuple frequencies and probabili- 
ties, and also the single-letter frequencies, are determined. Under these con- 
ditions the probability (5), being also determined, is mathematically inde- 
pendent of any further knowledge of the (n + 1)-tuple frequencies, so that the 
probability of a specified (n + 1)-tuple frequency-count is proportional to the 
number of ways in which this count can occur. It follows that, in applying 
formula (4) to linear sequences, the assumption that all permutations are 
equally likely may be replaced by the more general assumption of Markovity of 
order n — 1 or less without affecting the probability of the specified m-tuple 
frequencies. If a circular sequence is defined as a linear sequence with the ends 
joined (the most natural definition in interpreting the circular sequence as a 
Markov process), the total probability of the circular sequence (a; , a2, --- , ay) 
is the sum of the probability (5) over all N cyclic permutations of the linear 
sequence @, d2,-°--, ay. Then by the same argument the probability of a 
circular m-tuple frequency-count, as given by formula (4), is valid for all orders 
of Markovity up to and including the (n — 1)-st. 
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7. Asymptotic relationships. The arguments are based on the following 
lemma. 


LEMMA. 

Hypotheses. (a) An experiment (with a parameter N) has, for each value of N 
(positive integers tending to infinity) a finite set FY = {F?} of possible outcomes. 
Py (or simply P) and Py (or simply P’) are two probability measures over F”. 

(b) P’(F?)/P(F?) converges in the probability P to unity in the sense that for 
all » > O and & > O there exists an No such that, for all N > No, 


(6) Probp {P'(F?)/P(Fi) ¢ Ib} > 1 — 2, 


where I; is the interval (1 — 5, 1 + 8), and P’(F?)/P(F?) is regarded as a 
statistic whose distribution is determined by P. 

(c) S(F?) is a statistic whose cumulative distribution function ®y converges, as N 
becomes infinite, to a limiting distribution ® under P. 

Conciusion. The distribution function @y of S(F!) under P’ converges to the 
same limiting distribution ®. 

Proor. Let Ay (or simply A) be the set of all indices i for which S(Fi) S, 
where A is any real number. Let A be the set of all indices i for which 
P’'(F*/P(F?) eI; , for any fixed arbitrary 5. Let A’ be the complement of A, and 
let AA be the intersection of A and A. Suppose an arbitrary 7 given, and that 
N > No(n, 6). For alli in A, | P’(FY) — P(F?) | S 6P(FT). Hence }°s P’(F?) 2 
>. P(F?) — 6. But, by hypothesis (b), 5°, P(F7) > 1 — 9 (a restatement of 
(6) above). Therefore dos P'(F?) > 1 -— 1 — 6, or doa PF) <9 + 6. It 
follows that | Soa, P(F?) — Doss P(FT)| S 6 and | Soa PFT) — 


daa’ P(F?) | < 2n + 6. Therefore | 50, P’(Fi) — Do, P(Fi)| = |@x(A) — 
by(A) | < 2(m + 5). But » and 6 are arbitrary, and so @y(A) — (A) implies 
$y(X) — (A), and the conclusion follows. 

If all the f; are strictly positive, formula (4) for the probability of an m-tuple 
frequency count, given the single-letter frequency count {f;}, may be written 
in the form 


e N |\ [fes---ial® ll TT Sin---imen | TD Si ! 
(7) PU fain} = Sail nee 
7 f a NIT] fis---tm ! 


The second of these two factors will be recognized [7] as the probability 
P' ([fi;..-im]) Of the cell entries [f;,...;,,] in an ordinary contingency table with 
fixed marginal totals {f;,...;,_,} and {f,}, under the hypothesis that the two 
attributes are independent. Suppose N approaches infinity while each f,/N con- 
verges to a strictly positive constant k; , with all (V — 1)! circular permutations 
always equally probable. Then, for any fixed m, the relative frequencies of the 
m-tuples converge in probability to the corresponding product of k,’s: 

Six. -+ign /N— pki, .. is, , 


implying 
iP. rm No : —=?g \| (ki, oe k;,,\* | 
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The latter can be evaluated in various ways including — out the on 
factor in each row and diagonalizing; its value is (kyke --- k,)°"' (For 
another proof see the Appendix.) But 


Il fix--+ig a NY" (Keke eee oo 


and so the first factor in (7) converges in probability to unity. (The test may be 
applied to a linear sequence by applying the test to the corresponding circu- 
larized sequence without the blank (or see [14]).) Therefore, in view of the 
lemma, the hypothesis (H») of independence may be tested (N large) within the 
hypotheses (H,,_;) of Markovity of order m — 1 (see Good [4]) by any asymp- 
totic test of contingency in the ordinary contingency table described above. 
Indeed, the statistic —2 log Xo,m-1 of [4] for testing Ho within H,,_, is identical 
to the log likelihood-ratio statistic in the corresponding contingency table. (The 
asymptotic validity of the usual x’ tests on the contingency table corresponding 
to a Markov chain has already been indicated by Goodman [6], but the rela- 
tionship between the exact probabilities for the contingency table and for the 
Markov chain is interesting.) 


’ 


8. Distribution of pairs. In accordance with Sec. 6, the probability P(F) of the 
specified pair-frequencies F = [f;;] arising in a circular permutation (see formula 
(2)) is asymptotic (in the stochastic sense) to the probability P’(F) of the same 
entries F appearing in a contingency table of fixed marginal totals {f,;}. By 
formulas (1) and (2), 

P(F)/P'(F) = NC(F)/[[fil. 
It follows that the expectation E’g of a function g(F) with respect to P’ may 
be converted to an expectation Eg with respect to P by the relation 
(8) Eg = (N/TTfi)E'(g-C). 
For example, the expectation of a product of factorial powers of the cell entries 
{fis} of a contingency table with fixed marginal totals f; = >>, fi; and f.; = 
>: fis is (ef. Haldane [7]), under the hypothesis of independence, 


(9) BE’ TL fs? = ID et? Ts? Nn 
i j 
where 


= Dd aj, @j me De as and a= Dd ais. 
j i ij 


The a,; are nonnegative integers. 
[Proof of (9): 


eT = > sr 


{fiz} 
(ag oT 6. TT 5: tT] f.! 
& IT #5 NYT fa! 
im Men Ws > TI] Gi. — w.)! T] Us - @,)! 
N@ iti) (N — a)! Il (fi; — a;)! . 
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But the terms of the latter sum are zero except when each f;; 2 a,; . Hence the 
sum may be taken over nonnegative values of {f;; — a;;}, and so be recognized 
as a sum of probabilities in a (new) contingency table, and hence unity. This 
expectation is undoubtedly known, but the article by Haldane, who uses the 
same method for less general expectancies, is the only reference known to the 
authors.| If f; = f., = f,, then 


BE’ VW sG? = see se’ yn. 


Now the relation (8) and some algebraic manipulations give the corresponding 
expectation with respect to P: 


(10) ETT ge = DIE? | fide = ad 
ITs 


where the vertical bars indicate the determinant of the enclosed matrix. 
{Outline of steps necessary to obtain (10): The essence of the problem may be 
described as the evaluation of the factor X for which 


fe -Ja+*: = fos | 
E’' TT ff? |: = X-E’ [J fs”. 
— fia--> — Su 


@i;jt+)D 


Since fif'? -fi; = aiff? + fi , and since the f; and the a,; commute with 
E’, part of X (neglecting for the moment the expectation of terms containing 
the “higher powers” a;; + 1) is 


— Gee 


— an ++: fe — At 


The next terms to consider are those which arise from selecting just one higher 
power factor. The effect of increasing one a;; to a;; + 1 in E'T[ fij** is to mul- 
tiply the expectation by 

(fi — ai.)(f5 — «.;)/(N — a). 
Adding in these new terms gives 


X=m— DX (fi -— af; — fduu/(N — a) + R, 
i,j=m2 
where y;; is the cofactor, in m, of the entry containing a;;, and R, 
the remainder, consists of the expectation of terms containing two or more of 
the higher powers. But for each such term there is a term of equal and opposite 
expectation arising from using a different path through the determinant, namely 
the altered path which replaces the first (in the sense of going from 
the top down, say) two elements by the other two corners of the rectangle they 
span. Hence R = 0, and all that remains to be shown is that 


t 


(N — a)m — > (f; — a; (fj — a s)uss = | (fib; — a) }. 


=) 


t.gme 
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Working backwards one finds, by elementary row and column operations, 


ji ey -_ fi—as fo—ar--+ fi — awe 


=y —an fe— an--- — Cre 


— a1 att St — on 
IN-—a fs goes] . 
im  £ a a le 5 ; 

= ha = mm 7 =(N —a)m — aR (f: — af; — @.;) pgs). 


t,j=2 


fi — & 
In particular, 
(11) Efi? = fiP; — 63) /(N — VY", 


whence the distribution of the frequency of a single pair (7, 7) is found to be 


a) Pye = (A) (N 2-8) /(¥- 2). 


(The simplest verification is to compute the factorial moments of (12), but it 
is also possible to proceed directly with the help of Good and Toulmin [5], p. 
46, together with Vandermonde’s theorem.) One way of testing the null hy- 
pothesis of independence in a process against any alternative altering the prob- 
abilities of the consecutive pairs would be to use the familiar-looking statistic 


D (fis — Epis) /Efis 


(where Ef;; is given by (11) with a = 1) on the circularized sample; the distribu- 
tion is asymptotically gamma-variate with (¢ — 1)” degrees of freedom. Good- 
man [6] has found the same statistic except for the use here of the exact mean; 
and the analogous likelihood-ratio statistic is given by Hoel [8}. 


9. Acknowledgment. The authors are indebted to L. A. Goodman for valuable 

comments and suggestions. 
APPENDIX 

Evaluation of || [ki, --+ ki,.|* ||. We shall prove, without insisting on the con- 
dition 

i) kj th+---+hk= Bs 
that 

(ii) || [Ueig «=~ Kal® || = (hake + be rene, ees RD. 

Proor. Let M = [k,, --- k;,]. Then M = UG where U is the diagonal ma- 
trix {k,, --- k;,_,}, and G@ is the matrix that has zero elements at all places 
except in rows and columns with labels of the form 


(ip, °** ,tm-a) and (t2,--+, tm), 


while at these places G has the element k;,. For example, with ¢ = m = 3, and 
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with k,; = a, kz = b, ks = c, we have 


a 8-0 

0 ab 0 

0 O «a 

Oo 6 

U = 0 
0 

0 


0 
0 


oo coco © 


— 


0 


oeoose ococs 


a 
i 

Cmte Gein 

o 

eoocooeoocoo 


osecorsr oor ©& 
ecoonacosn OO 


The matrix whose cofactors we want is 
M* = sU — M = U(sl — G) = 
where 
sa kit ket--- +k. 


The rows of K each add up to zero while the rows and the columns of M* 
each add up to zero. Thus the cofactors of all the elements in a fixed row of K 
are equal, while the cofactors of all the elements of M* are equal. Denote their 
common value by 


x = |i [k 


agit 
%) “Im il * 


x may be obtained from the cofactor of any element in the vth row of K by 
multiplying this cofactor by the product of all the elements of U except its vth 
one. Therefore the sum of the cofactors of the diagonal elements of K is equal 
to 


dv Rays *Bin i 2 Chk: are. 
Il kiy> > hig, 
But it is also equal to plus or minus the coefficient of \ in the characteristic 
polynomial of K, | AJ — K| . This coefficient can be found by considering the 
eigenvalues of G. 

In the above example with ¢ = m = 3, by assuming that the components of 
a column eigenvector are x; , 22, --- , t9, and by multiplying this eigenvector 


(ili) « 
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by G, we easily see that either the corresponding eigenvalue is zero, or else it is 
sand 2; = 22 = --- = 2. These two eigenvalues both occur at least once, so 
the characteristic polynomial of G is of the form 

(iv) | AZ — G] = A*(A — 8° (a > 0,8 >0,a+ 8 = t™”’). (A more 
explicit proof of (iv) is given below.) Hence the characteristic polynomial of 
K is of the form 


WA — 8), 


and, since «x is positive, Eq. (iii) shows that 8 = 1 and that (ii) is true. 

The following checks of equation (ii) will help to clarify its relationship to 
previous literature. If we put ki = kz = --- = k,; = 1 and apply the BEST 
theorem, then we find that the number of circular arrays that contain each 
m-tuple precisely once is 

(ey */ 
and this agrees with p. 203 of de Bruijn and Ehrenfest [2]. If instead we put 
ky = ke = +--+ = k, = k"™, then we find that if each m-tuple is to appear ex- 
actly & times the number of arrays is 


; 


((tk) 1" '/(kt™), 


and this agrees with Theorem 3 of the same paper. 

The following proof of (iv) was kindly provided by Mr. O. 8. Rothaus. If 
we insist on condition (i), and it is easily seen that there is no real loss of gen- 
erality in doing so, then G gives the (m — 1)-tuple transition probabilities in 
an independent and stationary process. Now G”” has constant columns; in 
fact every entry in the column labelled 7 , --- , 7, is ki, --- k;,, . The stochastic 
matrix G”” is of rank 1. It has the eigenvalue 1 and all other eigenvalues are 
zero. Since G is stochastic, it too has the eigenvalue 1; but the eigenvalues of 
G”” are the (m — 1)-st powers of those of G, and so the other eigenvalues of 
G must all be zero. This proves (iv) and also that 8 = 1. 
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ON A STATISTIC WHICH ARISES IN SELECTION AND 
RANKING PROBLEMS' 


By SHanti 8S. GuptTa AND MILTON SOBEL 


Bell Telephone Laboratories 


1. Summary. The statistic y = (x\,; — x)/s, is studied where 2;,; is the 
maximum of p normal independent chance variables with common mean and 
common unknown variance ¢ , x is another independent normal chance variable 
with the same mean and the same variance o, and s, (distributed as o x:/» 
with v degrees of freedom) is an estimate of the common variance which is in- 
dependent of each one of the above p + 1 chance variables. Several different 
methods are proposed and studied for computing the probability integral of y 
and percentage points of y; in addition, a method for computing percentage 
points without first computing the probability integral of y is considered. A 
table of (upper) percentage points of y is given as Table I at the end of the 
paper. Applications of the statistic y to several ranking and selection problems 
are mentioned in Section 2. Moments of y are given in Section 3. In Section 7 it 
is shown that Table I can be used to obtain an approximation and bounds to 
the percentage points of a related statistic. 


2. Introduction. The statistic y was considered by Gupta [8] for the problem of 
selecting a subset of normal populations (with a common unknown variance 


o) which contains the “best” population with probability at least P*. For an 
explanation of how the statistic y enters particular problems the reader is re- 
ferred to [8] and [9]. If we let z; denote normal chance variables N(0, o°) with 
correlations p;; = 1/2 for i # 7 then it is easy to see that 


* 
(2.1) PO 1G = ie chs Py eg. 


8 


’ - 


The left-hand member of (2.1) is a special case of the probability integral of the 
multivariate analogue of Student’s /-distribution which is treated by Dunnett 
and Sobel in [6] and [7]. For p = 2 tables of the percentage points and the 
probability integral are given in [6]. Expressions for bounds on the probability 
integral] for all p are given in [7]. 

An application of the distributions in (2.1) to a problem of comparing several 
populations with a control is given in [5] where percentage points of y/+/2 are 
given to two decimals for p = 1 (1) 9, selected values of vy and a(or 1 — P*) = 
.05, .01. The distribution of y is also needed for the solution of a problem formu- 
lated by Paulson [12]. Three more problems for which the distribution of y is 
needed are (i) the selection by a sequential procedure of the “‘best’’ population 
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with probability at least P* of being correct, (ii) the selection by a multi-stage 
procedure of a subset which contains the “best” population with probability at 
least P* of being correct, and (iii) the selection by a single stage procedure of a 
subset which contains all populations “better” than a standard with probability 
at least P* of being correct. The last problem is treated in [9]. 

The reader should distinguish between the statistic y and the Nair’s student- 
ized extreme deviate [10] defined by (x:,; — #)/s, where Z is the average of the 
p ordered 2,,’s. 

The statistic v, which corresponds to y for the o-known case, is treated in 
detail in [8]. Percentage points of v are identical with the values of \~/N in 
Table I of Bechhofer’s paper [1] for the columns headed (k = i,t = 1;7 = 2, 
3, --- , 10). 


3. Moments of y. The moments of y can be written as a product of two ex- 
pectations. For r < pv 


3.1) ' = By’ | (*) ja[(#>*)) 
o o a 


[r/2] . 
é . (2j)! 
(3.2) = A —_ | 7 C2; ( J. . a1 ’ 


») ' 
j=0 “’j: 


where [r/2] is the largest integer less than or equal to r/2, a,,; is the 7th moment 
of the largest of p independent standard normal chance variables, and 


,(¥%+B 
le B) r(24) 
(3.3) he = B{(*) be arctae® 


i“ v\b2 ry 

, (5) ° (5) 

is the 6th moment of x,/+/v for both positive and negative values of 8, provided 
that 8 > —v. Forz = 1 (1) 10 and p = 1 (1) 50 the value of a,,; is obtainable 
from Ruben’s paper [13]. For 7 > 10 the value of a,,; can be obtained with the 
aid of some results of Bose and Gupta [2]; in particular, for p < 7 the value 


of a,,; can be obtained, without integration, as a linear function of the a,,; 
Gj =t—2,4—-4 --- 57 20). 


4. The probability integral of 1, and methods for its evaluation. Consider the 
expression 


(4.1) P=P z= 2 < a). 


If we hold «x and s, fixed and integrate first with respect to x,» , then, letting 
qd = q/~vv, we obtain 


(4.2) P= | r(x) il F*(u)f(u — xq’) au | dx, 


where f, F are respectively the standard normal density and the standard nor- 
mal c.d.f. and g, is the chi-density with v degrees of freedom. Similarly, holding 
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another pair fixed, we obtain two alternative forms which are not given here. 
Another expression for P in the form of a p-fold integral is given in [7]. 

Mertuop 1. Now we shall derive some series expansions for P using (4.2). If 
the function f in (4.2) is expanded about q’ = 0, then it is easy to justify a term 
by term integration and we obtain 


qAH(a) 


‘ q’ é 
(4.3) P= sks A; E{Hu)} = oF’ 
where A; is given by (3.3). Here H,(x) is the jth Hermite Polynomial defined 
as in |4|, the expectation is taken with respect to the density of u = 21p41/¢ 
given by (p + 1)F”(u)f(u), and the last expression in (4.3) is a symbolic one 
in which powers of A, H and a (say A“, H* and a“) are to be replaced by A. , 
H, and dp4:,4, respectively. The expansion (4.3) converges for all q and the 
convergence is rapid for small values of g. For example, for g = +/2/2, p = 2 
and vy = 2 using (4.3) with 7 = 0 to 10 and Ruben’s table [13] we obtain P = 
.520175 as compared with the value .52017 given in Table 1 of [6] (in the column 
h = .5O0 and the row n = 2.). 

Mertuop 2. Another expansion is derived by expanding the function f in 

4.2) about s, = o (i.e., about x = ~/v), obtaining 


P = [ F?(u) fe — 9) — fu — dE’ -1 
(4.4) 5 


+ £ 7%u — QE - 1 - --- | du, 


where f(x) is the ath derivative of the normal density f(z) and 
x = x/Vv = 


The first term in (4.4) is the integral for the o-known case tabulated in [8] and 
(11). After straightforward integration term-by-term in (4.4) we obtain 


= [ F?(u)f(u — q) dg — ql — Ay) | (u — q)F?(u)f(u — q) du 


2 oo 
? 5 21 — Aj) E: [u? — Qqu + (q° — 1))F(u)f(u — q) du 


3 * 
x a je—o+ As) f [u® — 3u°g + 3u(q’ — 1) — (q° — 39)) 
- F?(u)f(u — q) du + 


where A, is given by (3.3). Each of the integrals above is evaluated by expand- 
ing the factor e“ in f(u — q), thus obtaining a series in terms of a,4:,. . In the 
symbolic notation used above we can write P in the form 

—q? 2 


(46) P= [ F?(u)f(u — q) du + =e et ee 


> and q = +/2/2 using the first six terms of (4.5) or (4.6) we 


For p = 2,» = 
= 0.5192. This is comparable with the result P = 0.5188 obtained by 


obtain P = 
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using the first six terms of (4.3). Equation (4.5) gives a better answer in this 
case since the correct value is 0.52017. In general, we expect (4.5) to give better 
results for large v than (4.3); however, the computations for (4.5) are more in- 
volved. 

In writing (4.6) we have used the fact that 


9 
ee 2 


pt+ ‘| 


This series converges for all g and converges rapidly for small values of q. For 
example, for q = +/2 and p = 2 using the first eleven terms of (4.7) we obtain 
P = .7442 as compared with .7452 obtained from [11] by interpolation. It should 
be noted that the sum of any finite number of terms in (4.7) gives a lower bound 
to P for all integers p 2 1, and all non-negative q. This follows from the easily 
shown result that a;,. 2 0 for all integers 7 2 1, a 2 0. If the tables of Ruben 
[13] were extended to moments higher than the tenth, then better accuracy 
could be obtained. 

Metuop 3. Another method of calculating the probability integral is based 
on the result of Seal [15] that the distribution of » = (2,); — x)/o is asymptoti- 
cally normal as p tends to infinity. It follows from his result that the third and 
higher central moments of v tend to the corresponding moments of the standard- 
ized normal distribution. Since the coefficients involving v in A_, in (3.2) tend 
to unity as y — © it follows that the third and higher central moments of y 
tend to the corresponding moments of the standardized normal distribution as 
both » and p tend to infinity. Hence y is asymptotically normally distributed 
as both »v and p tend to infinity. It is therefore reasonable to approximate the 
distribution of y, = (y — E(y))/o(y) by a Gram-Charlier expansion in the 
Edgeworth form where 


(4.7) [ F*(u)f(u — q) du = l + g@p411 + 5 Oot. 4 of, 


(4.8) E(y) = Asa), and o(y) = [A_2(ap2 + 1) — (A-sa,,)*}” 
Using equation (17.7.3) of [4] and letting g, = (q — E(y))/o(y) we obtain 


= Ply <a) = Pl < a) = Fw) - | 29%Q) | 


+|% #¢2(q.) 4 aa fq | 


— [S47%Cq) + Bere Gq) + 2 (0, 


= 2 - 
+ | 24 Ca) + ae $(q) 


4. 2lai ag 9 4. 154e5 (1 
01 rs D1 f@)it+---, 


where ar is the rth standardized (i.e. relative) cumulant of y and given by 


a, = x,/x;”. The «’s can be obtained from the moments around the origin u, 
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(given in Section 3) by the usual formulae. For p = 9, v 18 and g, = 1.70 
we obtain gq = 3.703 and equation (4.9) gives 

P = 955435 — .008743 + .001485 + .003342 — .001355 = .950162 
which rounds to .950. 

Metuop 4. This probability can also be computed by using the Gauss quad- 
rature method based on the zeros and weight factors of the 15th degree Laguerre 
polynomial which are given by Salzer and Zucker in [14]. The values of the 
inside integral in (4.2) at the zeros of the Laguerre polynomial were computed 
by interpolation in [11]. This gives for g = 3.703 the result P = .9495 which 


rounds to .950. This agrees to three decimal places with the result obtained 
above by Method 3. 


5. Expression for the percentage points of y. In practice it is of much greater 
interest to compute the percentage points corresponding to fixed probability 
levels rather than the probability integral. Cornish and Fisher [3] have de- 
veloped a technique for computing the percentage points of a statistic directly 
without first computing a table of the probability integral values. Since y is 
asymptotically normal, their technique is applicable here. Applying their re- 
sult we obtain for the upper percentage point g, = q,(P*) 


q.(P*) = z,(P*) i [asl ¢] + fasla T a3l <2] + lasl. + oc30egl og + ails] 
+ [al s + ail a + acl + azcul 24 + asl +] 2 : 


(5.1) 


where z,(P*) is the standard normal deviate corresponding to P*, a, is the rela- 
tive cumulant defined earlier, and J. , Jz, /.2 , --- are tabulated in Table I of 
[3] for the probability levels P* = .75, .90, .95, .975, .99, .995, .9975, .999 and 
.9995. Then g can be found from gq, using (4.8). 

The same method can be used for lower percentage points except that the 
2nd, 4th, --- correction terms in (5.1) change sign. Combining the two results 
we can obtain “‘2-sided” percentage points with tails of equal probability, 
(1 — P*)/2. It should be noted that the lower percentage point corresponding 
to a probability of 1/(p + 1) is zero for all values of ». 


6. Construction of tables. The procedure described in Section 5 was used to 
compute the percentage points of y in Table I for P* =..75, .90, .95, .975 and 
99 fork = p + 1 = 2,5, 10 (1) 16, 18, 20 (5) 40, 50 and » = 15 (1) 20, 24, 30, 
36, 40, 48, 60, 80, 100, 120, 360 and «. 

We may regard the expressions in square brackets in (5.1) as correction terms. 
The expression (5.1) was calculated to 3 correction terms and also to 4 correction 
terms and the result is given in Table I to two decimal places. In some cases 
for small degrees of freedom where this method gives only one decimal agreement 
one of the methods in Section 4 was used to find the second decimal. For p = 1 
(or k = 2) the percentage point of y has been checked against +/2 times the 
corresponding percentage point of Student’s ¢t-distribution. 

For example, when P* = .95, »v = 18 and p = 9, using 3 and 4 correction 
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TABLE I 
Percentage points of y for selected values of p and v 


= .75 


p* 


1.645 + .084 — 013 — 017 — | 
3.697 (3.702 for 3 terms). Since p 


5.1), we obtain 9, 


terms in ( 


9 the 


.70 and 3.71 can be checked by a Gauss-Laguerre quadrature 


3 


(1.699 for 3 terms). This gives g¢ 


values q 


9493 


(Method 4 in Section 4). The results obtained were P(3.70) 


and P(3.71) = .9501. This gives q/1/2 = 2.62 which is comparable with the 


result ¢ 


.62 given in [5]. 
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The percentage points in all cases should be regarded as having been rounded 


to the nearest decimal so that the exact probability may be slight] 


slightly over the desired value P*. 
7. On a related set of percentage points. C. 


the related problem of finding values of d” such that 
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where P* is specified and z; are normal chance variables N(0, oc’) with correla- 
tions p:; = 4 fori # j(i,7 = 1, 2,---, p). It will be convenient to introduce 
the symbols m = d”+/2 and m’ = m/+/v. The exact integral expression which 
defines m’ and hence also d” is given by 


aw ( 2 \ 
(7.2) | g»(x) ‘| [F(u + xm’) — Flu — xm’))f(u) dup dx = P*. 
“0 \ Yoo / 


It is interesting to note that (7.1) can be written in the form P{yp < d”} = 


P* where in terms of independent normal chance variables x, 7, --- , tp we 
can write yp = max|2z; — 2z|/s, in contrast with the statistic 


y = max (x; — z)/s,. 


Let C, denote the hypercube | z;| < d”s,(i = 1, 2,---, p), let P{C,| m, v} 
denote the left member of (7.1) and let 


(7.3) M, = M,(m, v) = P{z; < d”s,(t = 1, 2,--- , p)}, 


which is given, for example, by (4.2) with g’ = d’+/2/». We shall give upper 
and lower bounds on P{C, | m, v} in terms of M, . Using an identity like (4.3.8) 
of [4] it can be shown (the proof is omitted) that 


1 — Pfall | 2z;| < d’s,} 
(7.4) = (1 — Pf{allz; < d”s,}) + (1 — Pfallz; > —d’s,}) 
—P{at least one z; = d”s, and at least one z; S —d”s,}. 


Since the first two terms on the right-hand side of (7.4) are both equal 
to 1 — M, we can write (7.4) as 


(7.5) 2M, — 1S P{C,|m, vy} = 2M,-—1+Qs5 4M, 
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where Q 2 0 is the last probability in (7.4). For moderately large p we have the 
approximation 


(7.6) Q => P{at least one z; = d’s,} P {at least one z; < —d’s,} = (1 — M,)’ 
so that form (7.5) and (7.6) we obtain 


(7.7) P{C,|m, »v} M 


Hence an approximation dz to the percentage points d” for Dunnett’s problem 
is obtained from a table of M,-values by setting 

d ” 2 
and d2, = 3° 
An upper bound d; is obtained from a table of M,-values by setting 


+i" 


on 


(7.8) M,(m, v) = P* 


7 m 
(7.9) M,(m, y) = and d; = Va" 


Finally a lower bound dj is obtained from a table of M,-values by setting 


bi . m 
(7.10) M,(m,v) = P* and d; = V3 


The approximation obtained by using (7.10) is best for large d”; for smaller 
d” we might consider another lower bound described below. It is conjectured 
that the approximations in (7.6) and (7.7) actually give upper bounds for all p 
so that (7.10) gives a lower bound on the percentage point but this has not been 
rigorously shown. 


APPENDIX TO SECTION 7 


” It is also possible to obtain an upper bound for the left member of (7.1) and 
a lower bound for the percentage point d” satisfying (7.1). These are obtained 
by replacing the p-dimensional cube C, defined by 


(7.11) lt;| s ad”s,/o; t; = 2:/oe (i 2,°°:, p) 
by a central p-dimensional ellipsoid E, 

(7.12) 

where b is determined so that they both contain the same hypervolume. By a 


well-known argument which plays a prominent part in the work of Neyman 
and Pearson and which is omitted here it is easy to show that 


(7.13) | exp (— 31’ ="t) dt < [ exp (—4t/"t) dt. 


Using (11.12.3) of [4] and equating hypervolumes, we obtain 
PI2, P| sp 1/2 ” Pp 
(7.14) wr br |z| _ (24 x) 


Pp vy 
r( 4 
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which determines b? = b’(x). Since ¢’=~’t is distributed as x, with p degrees of 
freedom we can write for the exact value P of the left member of (7.1) 
(7.15) P<j Pixs < PWO}g(x) dx. 


“0 


For even p and any v the right hand member of (7.15) can be integrated ex- 
actly and we obtain 


1 (p/2)—1 Z \ 
(7.16) PS 1-7—i5m 2, Coma € = =) 
(1+5) 
where 


2/p 
(7.17) 8d” k (1 tp 2) | 
oc 


For v = » and any p (even or odd) it follows from (7.15) that 


(7.18) P s P{xi, < FP}. 


If we set the right hand member equal to P* and solve for d” using (7.14) then 
we obtain a lower bound for d”. In particular, we obtain from (7.17), by setting 
k? = x3(p, P*) which is the P* — percentage point of x%, the result 


1 ot 1/2p 
ae Vie, PP 
4 k (1 + r)| 


(7.19) = 


Numerical calculations show that the results of this appendix do not give very 
close bounds for large values of d” and P*. They are included here principally 
for their theoretical interest. 
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CONTRIBUTIONS TO THE THEORY OF RANK ORDER STATISTICS— 
THE “TREND” CASE! 


By I. RicHarp SavaGe’ 
Stanford University and Center for Advanced Study in the Behavioral Sciences 


0. Introduction. In spirit this paper is a continuation of [5] and the techniques 
and terminology developed there will be used. Here we are concerned with the 
detailed relationships between the probabilities of rank orders under various 
“trend”? hypotheses. The relationships found are of interest in themselves and in 
the theory of nonparametric tests of hypotheses. 

Typically we shall be concerned with mutually independent random variables 
X,,-::, X» such that X; has a distribution function of the form F(x — 6;) 
where the 6; form an increasing sequence. Conditions are given under which 
one rank order is always more probable than another, one rank order is equally 
probable with another, and these results are translated into conditions for ad- 
missible rank order tests. References [1], [6], and [7] summarize information 
regarding large sample properties of nonparametric tests of this type of hy- 
pothesis. 

In Section 1 two definitions of rank order are presented along with some 
“algebraic” properties of rank orders. Section 2 contains an enumeration of the 
hypotheses that we are concerned with. Section 3 presents theory and Section 4 
contains applications. 


1. Rank orders. 

DeriniTIon: The rank order corresponding to the N distinct numbers 
%1,°** , ty is the vector r = (r,, --- , ry) where r; is the number of z;’s S 2; . 
r is a permutation of the first N integers. If in the definition the x’s are replaced 
by random variables then R will be used instead of r. R will be defined with 
probability one when the underlying random variables have continuous dis- 
tributions. 

Derinition: r’L,;r if 1% = ~ fori # k # J; r, = r;,%; = 7; ; and 
(r; — r;)(¢@ — j) > O. 


Thus, if r = (2, 3, 6, 5, 4, 1) and r’ = (2, 5, 6, 3, 4, 1) then r’Lyr. We shall 
write r’Lr as an abbreviation for r’L,;r or to denote that there is a chain of rank 


t +1 


1 T 1 t t 
orders r,---, 7,°::, r such that 7r’L,,;,7-Li,j, --- 7 Lig 


0 Livist. 
Thus if r = (2, 3, 6, 5, 4, 1) and 7’ = (3, 5, 6, 4, 2, 1) then r’Lr, T = 2 and 
r’L35(2, 5, 6, 4, 3, 1)L45(2, 5, 6, 3, 4, 1)Lar. For many of the hypotheses (see 
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Theorem 1) that we shall consider, r’Lr will imply that rank order r’ is less 
probable than rank order r. 
DeriniTion: r*Cr (rank order r* is the complement of rank order r) if 


r= N +1 — ryas-i for «+=1,--- ,N. 
If r*Cr then rCr*. The necessary and sufficient condition for rCr is that 
rit twee = N +1. 


If N* is the largest integer less than or equal to N/2, then the number of self 
complementary rank orders is (N*)!2*" and the ratio of the number of rank 
orders to the number of self complementary rank orders is the product of the 
odd integers not greater than N. Thus most of the rank orders, for large N, 
occur in complementary pairs. Under a particular type of symmetry (see Theo- 
rem 5) complementary rank orders are equally probable. 

Another definition of rank order can be given in the following manner. Let 
t = (t1,°-+-, tw) where r; = 7 when the 7th smallest of the numbers 


(%, cm , tn) 


is z;. The relationship between r and r is: re = b is equivalent to m = a for 
a,b = 1,---, N. It is easily verified that r’Lr is equivalent to r’Lr and that 
r*Cr is equivalent to r*Cr. 


2. Hypotheses. Throughout we shall make the following assumption. 

AssumMPTION: The random variables X,, --- , Xw are mutually independent 
and each X; has an absolutely continuous (w.r.t. Lebesgue measure) cumula- 
tive distribution function. 

We shall let F;(x) denote the cumulative distribution function and f(z, 6;) 
the density function of X; . The 6;’s can be thought of as indices for the density 
functions but in many of the hypotheses they will correspond to parameters 
about which we shall make further assumptions. 

Hy: There exists a cumulative distribution function F(z) such that 


F(x) = F(z) for i= 1,---,N. 


Hy, is our null hypothesis and under Hp each rank order is equally probable. 
H,: The @,’s are real valued and the following conditions hold: 
1LlASaAS-*+ Soy. 
2. If 6; < 6; and z < y, then 


| f(x, 6.) f(x, 8) 
=0 
| fy, 6;) SY, 6;) 


with strict > for some x < y. 
3. f(z, 8) is a continuous function in z for each 86. 
4. The set of points on which f(z, 6) is positive does not depend on @. 
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The conditions of H, apply except (2) is replaced by: If 


0:, <0, < ++: < OH, 


and 24, < a <-+- < a, then the determinant of (an,) = 0, where 


AOmn = f(xm, 9;,). The inequality is strict for some 2; < tz < --* < %&. 
The 6,’s are real valued and the following conditions hold: 
1LlG&SaS--: 3 Oy 
2. f(x, :) = g(0i)h(x)e" 
where g and h are nonnegative functions. 
The @’s are real valued and the following conditions hold: 
LlO<&S-:: S Oy. 
2. There is an absolutely continuous cumulative distribution function, 
H(z), such that 


F(a) = [H(x)\". 


H,: The 6; = 16 > 0, and f(x, 0;) = f(x — 10) = f(t — =). 

The following relationships hold among H, to H; . H, implies H; implies H; 
implies H; . He is the same as H, . H; is compatible with H, , H., and H; but 
not H,. The densities under H, satisfy the monotone likelihood ratio condition, 
the densities under H; have been described as being of Pélya-type k, the densi- 
ties under H; are of exponential type [3], and the distributions under H, have 
been of interest in nonparametric inference ([4], [5]). Hs implies that the density 
functions are symmetric about their medians. In practice, when a one-parameter 
family of distributions is assumed satisfactory for a particular problem, the 
family used often satisfies H; . The Cauchy density with translation parameter 
satisfies only H,; . The distribution following Theorem 2k satisfies the conditions 
of Hx but not Hy, . The extreme value distribution with log @ playing the role 
of the location parameter and the exponential distribution over the negative 
reals satisfy the assumptions of H, . 

A typical problem of interest is to form rank order tests of: 

Hyv—X, , Xy are independently and identically distributed, against the 

alternative hypothesis 

H,—X, - , Xy are independently and normally distributed with common 

variance and E(X;) = a + y,@ where the y;’s are known and @ > 0. 

For hypotheses of the form H, it is always possible to relabel the X’s in 
order to make the y;’s a nondecreasing sequence. In doing the relabeling in order 
to preserve 6 > 0 it might also be necessary to replace all of the original ob- 
servations by their negatives. Two sided alternatives would present no real 
difficulties. The removal of the assumption of continuous distribution functions 
and hence the possibility of ties would be more complicated. 


3. Theoretical results. Due to the last remark made in Sec. 1 it will be ap 
parent that Theorems 1, 2, and 5 and their corollaries are valid if the r defini- 
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tion of rank order is used instead of r. Theorems 3 and 4 and their corollaries 
are valid only in terms of r. The theorems, except Theorem 2k, with k > 2, give 
conditions for determining when the probability of one rank order is greater 
than the probability of another. These conditions yield necessary criteria for a 
rank order test to be admissible. Theorem 2k involves a relationship between 
k! rank orders, which relationship, for k > 2, has not helped in determining ad- 
missibility of rank order tests. 
THEeoreM 1. Jf H,, then r’L;;r implies Pr(R = 1’) < Pr(R = r) when 


0; < 6; and rs < toe 
Proor. A direct computation yields 


Pr (R = r) — Pr (R = r’) = | ee / 


—OC 2 << eN<@ 


| 4 f(x | [flar,, Of (te; , 83) — f(x, , Of (2, , 6:)] | 1 da, | . 


i stk xj 


The first bracket of the integrand is nonnegative since f(z, @) is a density 
function. From assumption 2 of H, and since 6; < 6; the second bracket of the 
integrand is always nonnegative and positive for some values of z,, and 2,, , 
say u and v, such that « < rv. From assumptions 3 and 4 of H, the whole in- 
tegrand can be made positive in a region of the following type: z,, is near u 


for r, < r; and 2,, is near v for r, = r;. Thus the integral is positive. 

Without an assumption like H, , Theorem 1 is false ((5], Sec. 5). 

Corouuary 1.1. Jf H, , then r'Lr implies Pr(R = r) > Pr(R = 1’), provided 
the 6,, corresponding to those i for which r; # 1; are not all equal. 

Corouuary 1.2. Jn testing Hy against H, with the added restriction 


A,$< 6 << --- < Oy 


an admissible rank order lest must have the following property: If r'Lr and the 
probability of rejecting Hp is >0 when R = 1’, then the probability of rejecting 
Ho equals 1 when R = r. 

Coro.uary 1.3. If H; and 6; < --- < Oy, then Pr(R; = 1) > Pr(Riga = 1) 
fort = 1,2,---,N—1. 

Let r’ be a typical rank order of the set of k! possible rank orders which can 
be formed by permuting k integers in k positions of the vector describing a rank 
order. One element and the k positions determine a set. Denote such a set by 
®, . If the number of interchanges required to bring the k movable coordinates 
of r' into increasing order is even let c(r') = 1 and otherwise let c(r‘) = —1. 
Thus for N = 5, k = 3, and positions 1, 4, 5 the following constitutes an ex- 
ample of an @; . 
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25314 
25341 
15324 
15342 
45312 
45321 
THEOREM 2k: Jf Hx , then 


~ c(r') Pr (R.= r') = 


rea, 
with strict inequality when the k values of 6 corresponding to the variable 
ranks are distinct. 
Proor. This theorem is proved in the same manner as Theorem 1. 
To show that the results of Theorem 2k are not implied by the conditions of 
Theorem 1, consider the following density function 


| ? 


f(z, 6) = i : 
\g(0)[100 + 10(2 — 0) — &(x — OY — e(x — 8)’), 


where | @| < 1, e€ is a small fixed number and g(@) is the normalization factor. 
For this density the conditions of Theorem 1 hold but the sign of the deter- 
minant in H2; is reversed. Thus not only is the condition of Theorem 2k not 
valid for this example but actually the inequality is reversed in the conclusion. 


THEoREM 3. Jf (1) the assumptions of H; hold, (2) rank orders r” and ¢ ar 
such that 


Sve drn 
j=l j=l 


and the inequality is strict for at least one value of i and (3) 6; = 16 where 6 > 0, 
then 


Pr(R = rv”) < Pr(R = 1). 


Proor. A direct computation yields 


Pr(R = xr) — Pr (R = vr”) = | 0(6) / wee | TI h(z,) ax, | 


. | exp (> 2.64.) — exp (& r.0.2) | ; 


It is sufficient to show that the last bracket is always positive or equivalently 


> 2:(6,, — 67) > 0. 
i=] 
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The identity 


t=1 


N N-1 t 

= x,(6., I 6.7) _ 7 (x; _ List) | (6., — a) | 
i=] j=l 

yields the desired inequality, since: 


(a) Zi — Zw < OV, 


(b) ! > (:, — 6,) =@> (tr; = r;) < 0, 
j=1 j=l 
and <0 for some i by assumptions 2 and 3. 

Assumption 2 of Theorem 3 is equivalent to: )\}.; rj = Doh. rf for i = 
1, 2, --- N, and the inequality is strict for some 7. Incidentally, assumption 2 
of Theorem 3 does not imply r” Lr. This can be seen by examining 

r = (2, 5, 1, 3, 4) and r” = (3, 4, 2, 5, 1). 
On the other hand, r’Lr does imply assumption 2 of Theorem 3 in the obvious 
direction. 

Coro.titary 3.1. In testing Ho against H; (with the added restriction 6; = 
1@ > 0) an admissible rank order test must be such that if rank orders rv” and r 
satisfy assumption 2 of Theorem 3, then if the probability of rejecting Ho is positive 


when v” occurs, the probability of rejecting Hy when vr occurs must be one. 
THeoremM 4. Under Hy, 


prot =» = (Iio)] EI (Se. 


Proor. See Theorem 7a.1 and proof in [5}. 
Coro.iary 4.1. In testing Ho against H, with the added restriction 6; = 16 
the uniformly most powerful rank order test is based on large values of the statistic 


T(r) = Il (= 8) 


t=1 \=l 


In Corollary 4.1 “‘uniformly” refers both to @ and to H(z). 
TuEeoreM 5. Under H; , if r*Cr then Pr(R = r*) = Pr(R = r). 
Proer. 


i. 
Pr(R = r) = / ps / II (f(z, — 10) dz,}. 


t=—1 
—S<CBrm<* << tTVK ee 


Now make the change in variables x; = (N + 1)@ — yy-i4: and obtain 


Pr(R r) = / ‘29 / Il f(—Ywers4s + A(N + 1 — i)) dy;| 


t=1 
—e<vi<c***<yn<e 


| — [ II [f(yes — 10) dy;] = Pr(R = r*). 
een <:**<en<e ~ 
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TABLE 1 
Admissibility properties of rank order tests of trend 


1. Hypothesis 


2. Theorem 
3. Condition 


Statistic!.? 


Ti(r) 
T(r) 


T;(r) 
4 
t 
T(r) 


T(r) = a d(r; » Pi-n/ 


t=1 


N 
T's(r) Zz. d(n, ri) 
i—nt+l 
(N= tn) 


Tx(r) z, (N — 2+ l)d(r; > TNeist) 
(N=tn) 


T3(r) = Z, d(r;, Ten+i) 


(N3n) 


T,(r) = z d(r; , Tayi) 


(N=tn) 


N-1 
Tw(r) = > ld( max rj, ris) —d(rig:, min rj) 


_ 1sjsi lsjsi 


Tu(r) = T(r) _ T(r*), (r? = TN- sp — 


1 For each statistic large values are critical for the alternatives of Sec. 2. 

* Define Ejy as the expected value of the ith smallest observation in a sample of N from 
a normal distribution with mean 0 and variance 1. Also define d(z, y) as lif z < y and as 
Oifz2 y. 

3 If r’Lr implies T(r) > T(r’) the symbol + is recorded. If r’‘Lr implies T(r) 2 T(r’) 
the symbol 0 is recorded. If there exists r’ and r such that r’Lr and T(r) < T(r’) the sym- 
bol — is recorded. 

4 The results are easily obtained. The positive results are found by first examining the 





RANK ORDER STATISTICS : 975 


In the notation of [5] for the two-sample case define z*Cz to mean that 


* —_— 
oe 1 — S9at+1 « 


Then, under the assumptions of symmetry and translation, z*Cz implies 


Pr(Z = z2*) = Pr(Z = 2). 


The proof is much like that of Theorem 5. Theorem 6.1 of [5], for the two- 
sample case, is implied by Theorem 1. 


4. Applications. Many rank order tests have been proposed for the hypotheses 
of Sec. 2. At the present time we present a catalogue, far from complete, of such 
tests. Also included are some new tests. For all the tests listed large values of the 
test statistic are critical for the alternatives under consideration. Information 
regarding these tests is summarized in Table 1. 

The statistic 7; was introduced in Corollary 4.1. The statistic 72 yields the 
rank order test whose power function has the largest derivative at 6 = 0 in 
testing Hy against the alternative that X,,--- , Xw are independently dis- 
tributed each with a normal distribution having common variance and 


E(X;) = a + 26, > 0. 


The symbol + in the column marked r’Lr means that the test statistic satis- 
fies the admissibility condition of Corollary 1.2 and the symbol — means that the 
test fails to satisfy this condition. The symbol 0 means that T(r’) = T(r) can 
occur when r’Lr and thus Corollary 1.2 can be useful in discriminating between 
tied values of 7’. Positive results are obtained for those test statistics which make 
intercomparisons between all of the coordinates of the rank orders and negative 
results correspond to those test statistics whose structure is rather simple and not 
all of the intercomparisons are made. 

The symbol + in the column labeled >> (rj — rj) 2 0 means that the cor- 
responding test statistic satisfies the admissibility condition of Corollary 3.1 and 
the symbol — signifies the statistic does not satisfy this condition. The results are 
like those for the r’Lr column with a few more negative results since the con- 


Ly relationship. The negative results follow from counter examples. Thus for 7's consider 
r = (3, 2,5, 4,1) andr’ = (5, 2, 3, 4, 1). 

Sf Dj (t; — tj) = O for all i and strict inequality for some i implies 7'(r) > T(r”) 
the symbol + is recorded and the symbol — is scored if for some r” and r satisfying the 
partial sums condition T(r”) > T(r). 

6 The positive results for 7’; and 7: are obtained in the same manner as the proof of 
Theorem 3. The positive result for 7; is implied by Corollary 4.1. The negative results are 
obtained by constructing counter examples. Thus for 74 consider r = (1, 8, 2, 7, 6, 5, 4, 3) 
and r” = (4, 5, 3, 6, 7, 8, 1, 2). 

7 The symbol + is recorded if r*Cr implies 7'(r) = 7'(r*) and the symbol — is recorded if 
for some r* and r we have r*Cr and T(r) # T(r*). 

8’ These results are trivial. 

® References are given to places where the test statistic has been used for the types of 
alternatives considered in Sec. 2 without attempting to reflect priority of publication. 
Reference [6] summarizes large sample efficiencies for many of the tests. 
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dition to be filled is stronger. The most interesting of these results is the — for 
T, which is essentially Kendall’s tau. Thus for some levels of significance Ken- 
dall’s tau is inadmissible amongst the class of rank order tests when considering 
trend in exponential alternatives. 

The symbol + in the column labeled r*Cr means that r*Cr implies T(r*) = 
T(r). Under the conditions of Theorem 5 which frequently hold in practice 
this is a reasonable condition, i.e., rank orders which are equally probable give the 
same value for the test statistic. The sole negative result corresponds to 7; 
which is the optimum statistic for a class of alternatives not included in the al- 
ternatives considered in Theorem 5. 

Since the two forms of rank order, r and r, are equivalent in the sense that one 
determines the other, the statistics in Table 1 could all be expressed either as 
functions of r or r. T, , for instance, appears exactly the same in both cases. 
On the other hand, 7’; is easier to define in terms of r and the natural definition of 
Ts is in terms of r. 

The interpretation of Theorem 1 may be modified to give useful results about 
rank order tests of independence for bivariate distributions. In the density 
function f(x, 6) replace 6 by y and now write it in the form f(x | y), i.e., the con- 
ditional density function of x given y. When X and Y have a joint bivariate 
distribution assume the conditions of H,; (and of Hx») are satisfied for f(x | y). 
It is easily verified that when f(x | y) satisfies the conditions of Hy, then f(y | x) 
satisfies the same conditions. Thus the meaning of ““X and Y are jointly of 
Pélya-type k’’ is clear. 

In the bivariate case define rank order in the following manner: Let 
1, Y13°°* 3 4n~, yw be N pairs of numbers such that no two of the z’s (y’s) are 
equal. Rearrange the order in which the pairs are written to obtain 2) , Yq ; 

* 3 2tw) , Yor) Where yp) < yp) < --* < yp . The rank order is now given by 
the vector r = (rn, ---, rw) where r; is the number of 2;;; S 2; . When the 
pairs x;, y; are replaced by random variables X;, Y; the rank order r can be 
replaced by the corresponding rank order R. 

When the pairs X; , Y; are mutually independent with a common density func- 
tion with respect to Lebesgue measure, then the random variable R is defined 
with probability one. If the null hypothesis (X; and Y; are independent) is also 
true then Pr (R = r) = 1/N!. On the other hand, if X; and Y; are jointly of 
Pélya-type k, then the results of Theorem 2k hold. The proof consists of noting 
that for given yp) ,--- , Ypv) the desired inequalities hold between the proba- 
bilities of rank orders. Thus the inequalities must hold unconditionally since 
m(y) > n(y) implies Em(y) > En(y). 

The statistics T,, and 7, have been frequently proposed as tests of inde- 
pendence and from the results of the preceding paragraph we obtain some 
evidence that they are admissible when the underlying bivariate distribution is of 
Pélya-type 2. The other tests in Table 1 could also be used as tests of independ- 
ence. Those tests which have negative signs in the r’Zr column, however, will be 
inadmissible. 
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RANDOM UNIT VECTORS II: USEFULNESS OF GRAM-CHARLIER AND 
RELATED SERIES IN APPROXIMATING DISTRIBUTIONS’ 


By Davin DurAND AND J. AnTHUR GREENWOOD 


Massachusetts Institute of Technology and Harvard University 


0. Summary. The distribution of the sum of n random coplanar unit vectors 
and of a given component of the sum has been discussed by many authors, who 
have shown that each distribution can be approximated in series that are asymp- 
totically normal. But the difficult question of the usefulness of these approxima- 
tions for finite n—in particular for small n—has not been exhaustively treated. 
Accordingly, this paper reexamines some analyses of Pearson’s series for the 
vector sum, presents corresponding series for a component, and examines the 
accuracy of the latter series. 


1. Basic formulas. Given a sample of random coplanar unit vectors [cos 
&;, sin &], where all values of &; (¢ = 1, 2, n) between 0 and 27 are equally 
likely, we define the quantities 


V=) cost, W=)>snt, R=(V?+ WwW’). 
According to Kluyver [10], the probability that 0 s R S ris 


2 


(1) P,(r,n) =r [J o(t)]"Ji(rt) dt, 


/0 

and the probability that 0 < V S vis 

(2) Pe(v,n) = 5 += Utor 2" a. 
“ WT 0 t 


Differentiating (2) yields a formula for the differential of probability, namely 


(3) dPy(v, n) = E | [J o(t)]” cos vt at| dv, 
T 0 
explicitly given by Lord [11]. 


2. Series approximation of the #-distribution. As an asymptotic approximation 
to (1), the method of steepest descent yields a formula originally due to Ray- 
leigh [16]|—namely, 

(4) l-—e’, 


where x = r°/n. It will be seen that (4) is the volume under the two-dimensional 
Gaussian bell 
| site? 
dP = —e a dvdw 
mm 

Received November 12, 1956. 

1 This is a revision of the paper ‘“‘Approximation to the Distribution of the Sum of 
Cosines of Random Angles (Preliminary Report) ,’’ which was presented at the Ann Arbor 
meeting of the Institute of Mathematical Statistics, August 30, 1955. 
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inside the circle » + w = nz. As an approximation to the differential 
(1/2xr) d/drP»(r,n), Pearson [14] derived an asymptotic series having the form 


x 


d er 
a aaa coca » L(x). 
a P,(r, n) a 2 c; L(x) 


" ] 
oD) — 
( 2rr 


where the L(x) are Laguerre polynomials, and the c; are as follows: 
| co = 0, 
3le, = —2/3n',, 
(6n — 11)/8n’, 5les = (50n — 57)/15n*, 
6 leg — (1892 — 2125n + 270n*)/144n’. 


Pearson believed that series (5) through ce would provide a satisfactory ap- 
proximation for n > 6. Lord [13], however, writes: ‘These formulae have been 
very little tested for s > 1, but they would appear to behave rather like the 
Type A series (s = 1)’ and to give satisfactory approximations for nearly normal 
distributions except at the tails. In the case of the distribution of the sum of n 
coplanar random vectors of equal magnitude, Pearson concluded that five terms 
lie., through ce] of the series were enough to give four-decimal accuracy for 
n = 7, but investigations which the author hopes to publish shortly suggest that 
he was rather optimistic.” Lord supports his belief with some illustrative cal- 
culations for n = 4, 6, 8, and 10. On the basis of our own calculations we concur 
with Lord’s view, although we realize that the suitability of an approximation 
depends upon the number of reliable decimal places that one demands to work 
with, and this in turn depends upon whether one wishes to approximate the 
central portion of the distribution or the tails. For significance tests one wants 
fairly accurate points in the tail. 

In considering the accuracy of a series approximation, one may be interested in 
two things: first, how many decimals a fixed number of terms will yield; and 
second, the most profitable term to stop at. We missed the second point in our 
previous paper, where we presented a table (Table 3 of Greenwood and Durand 
(6]) showing calculated values of (1) for n = 7 and n = 14 for comparison with 
Rayleigh’s approximation (4), the integral of Pearson’s approximation (5) 
through cs, and a rearrangement of the integrated series through n™. On re- 
examining the calculations underlying this table, we find that the term in nn” 
secures about three decimal accuracy; and that for n = 7, further terms are not 
very helpful, but for n = 14 the term in n™ increases accuracy roughly from 
three decimals to four (see Table 1). 


3. Series approximation of the V-distribution. Early writers on the random 
walk—including Einstein [4], Rayleigh [17], and Wiener [20]—recognized that the 
distribution of V is asymptotically normal. The mean is obviously zero and the 


2 Lord is considering the generalizations of the R- and V-distributions to spaces of s 
dimensions (cf. Watson [18], p. 420-421) 
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TABLE 1 
Values of Pe(r, 7) and P»(r, 14) and several approximations 


Pris, n) Approximation through terms in— 
by 


quadrature no), co | nn? n* 


nu = 4 

. 12500 . 13312 0.12491 0.12497 0.12500 0. 12530* 
.41782 .43528 0.41882 0.41817 0.41819 0.41819 
.71404 .72355 0.71448 0.71349 0.71337* 0.71305 
. 90039 . 89830 0.90067 0.90087 0.90082 0.90095 
. 97864 .97188 0.97752 0.97832 0.97846 0.97857 
.99788 .99416 0.99753 0.99781 0.99785 0.99779 
.00000 .99909 1.00023 1.00011 1.00006 1.00003 

n= 14 

.06667 .06894 0.06665 0.06666 06667 0.06668 
. 24193 .24852* 0.24195 0.24192 .24193* 24195* 
.46583 .47421 0.46602 0.46583 . 46583 46583 
.67524 .68109 0.67551 0.67525 .67524 .67521 
.83105 . 83232 0.83118 0.83107 .83105 0.83104 
.92570 0.92357 0.92558 0.92570 92570 92571 
- 97285 0.96980 0.97263 0.97283 .97285 ). 97286 
.99197 0.98966 0.99183 0.99195 .99196 99196 
99815 0.99693 0.99813 0.99815 99815 Q9R14 
10. . 99969 0.99921 0.99973 0.99970 . 99969 .99969 
11.0 . 99997 0.99982 1.00000 0.99997 .99997 .99997 
12.0 1.00000 0.99997 1.00002 1.00000 00000 .00000 


mama kh WS t= 


‘ 


* These values correct erroneous entries in Table 3 of [6). 


variance is easily shown to be n/2. Horner [8] recapitulated these results and 
gave (p. 153) the specific formulas: 


dP,(v,1) = — ates 
m1 — v2 

(6) 

dv [ du 
> |o}—1 (1 — wpPi — (u— |v 2}42? 
and showed that dP y(v, n) may be computed by convolving dP y(v, n — 1) with 
dP y(v, 1). Lord [11], more generally, showed that dP,y(v, n) may be computed 
by convolving dP y(v, n — k) with dP y(v, k). Since Horner considered computa- 
tion by convolution difficult, in which view we concur (see below), he derived a 
modified Pearson series and employed it to estimate dP,y(v, 7). He evidently 
thought highly of this series approximation, since he used it as a standard against 
which to test the simple normal approximation for n = 7, and he even considered 
the normal approximation “‘very close.”’ Slack ({15], p. 77) considered the distri- 
bution of V effectively normal “except when 7 is very small, i.e., <10.’"’ We do 
not share this optimism—though it is again a question of how many decimal 
places in what section of the curve are required to render the fit ‘very close.” 


dPy (v, 2) 





“> 
7 . 
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Horner did not present the modified Pearson series, and to our knowledge, no 
one else has. We therefore derived the series according to the method described by 
Lord ({13} p. 347), and we present it below, partly to support subsequent compu- 
tations and partly because we think the term in n™ may be useful. In this series, 
the substitution z = v(2/n)'* makes the variance unity and simplifies the series 
so that 


“x 


(7) dPy(v,n) = dz >, ¢¢"(2)/(—2)' 


+ 


Here, the notation 


6'(2) = (an)? 
dz” 
conforms to the Harvard tables [7], which are probably the best means for 
evaluating the series; and the c; are identical with those in Pearson’s series. 
Integration of (7), term by term, gives Py(v, n). 

To establish limits of error for (7) or its integral is a problem for which we have 
found no simple, systematic solution. In the form given, (7) is a Gram-Charlier 
series of Type A. Theorem 4 of Cramér [3] indicates that it converges absolutely 
for n = 3, and its integral converges absolutely for all n. It is sometimes con- 
venient to rearrange the terms of (7) in decreasing powers of n, yielding an 
Edgeworth series. Theorems 2 and 3 of Cramér indicate that this Edgeworth 
series and its integral through n~* is asymptotic with error O(n~*'). Finally, a 
theorem by Esseen ([5], p. 43) establishes bounds to the discrepancy between 
P,(v, n) and its normal approximation. But none of these facts seems to have any 
great practical value. Esseen’s inequality indicates that 


| ez j 

| o(t) dt — Py(v,n) | < 9.003n™”; 

/—2 
and this implies that a sample of some 10° observations is required to assure 
three decimal accuracy. This, of course, is absurd, since the distribution of V is 
symmetrical and the error is O(n™), not O(n”). But no one, to our knowledge, 
has worked out bounds for a symmetrical distribution. 

In hopes of setting reasonable bounds for the error in the series approxima- 
tions, we proceeded to ascertain certain values of (2) and (3) that were fairly 
easy to compute and to compare these with the approximations. The maximum 
ordinate 


dP,(0,n)/ dz = 4 (3) [ [Jo(t))" dt 
x \2/ to 


was fairly easily calculated by quadratures with available equipment. To do this 
job, a punched card table of the Bessel function J» was involuted and summed on 
IBM machines. Then the sum was corrected for curvature at the upper end; 
and when necessary, the portion of the integral lying outside the limits of the 
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TABLE 2 
Values of dPy(O, n)/dz and several approximations. 


dP (0, n)/dz Approximation through terms in— 
by quad- 
| Fature | 


n-, os n-? n-? ¢ eo 


34948 37401 . 37386 37418 3647 . 36831 37582 38110 
-40637 . 38024 . 38016 . 38029 “ 37782 38147 38287 
37928 . 38398 . 38393 -38400 “ -38273 . 38475 38515 

. 38947 . 38648 . 38644 38648 38417 . 38574 . 38697 38706 

. 38742 38826 . 38823 . 38825 j . 38779 . 38859 . 38858 

- 39002 38959 38957 . 38959 .38 - 38928 . 38983 . 38979 

9 .39048 39063 39061 . 39063 “ .39041 -39080 39075 
10 .39152 .39146 39145 .39146 39130 | .39159 39154 


punched card table was evaluated by integrating the asymptotic series for 
[Jo]". As a check the involution and summation was repeated on an entirely 
different series of IBM machines, and finally the integral 

 30.6346065 


Jo(t) dt 


—that is, from 0 to jio—was compared with the value given by Watson ((18], p. 
752) for half this integral. 

From the comparisons given in Table 2, one sees that the term —¢'(z)/16n, 
which is the first correction term in either the Gram-Charlier or the Edgeworth 
series, produces a substantial improvement over the simple normal approxima- 
tion 0.39894. But the contribution of further terms is doubtful, to say the least. 
This is particularly true of the Edgeworth series, since the error through n™ is 
positive for odd n and negative for even n so that inclusion of one or more further 
terms must improve half of the approximations at the expense of the other half. 
In effect, it appears that the Edgeworth series either does not converge, or con- 
verges to the wrong value, and that the Gram-Charlier series converges too slowly 
to be of great use, if refinements over the first correction term are required. 

Table 2 gives a fair notion of the accuracy of the series approximation through 
n', since the error 


dP y(v, n)/ dz — o(z) + o'(z)/16n | 


is bound to be large at v = 0; in fact, we are able to show that it achieves a 
local maximum there for n = 5, 6, 7, 8, and 9. The first derivative of this error is 
easily shown to be zero at v = 0. The second derivative 


/ 3/2 ,@ 
(—}* | * : (5) | #[Jo(t)]” cos vt dt + ¢°(z) — ¢°(2) 16 | 
- “0 


reduces to 


19 


(8) (-»"| - (5) F r lJ (t)|!" dt + ( — 15 16no(0) | 
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TABLE 3 
Values of Py(v, 4) and several approximations 


| in— 
sou | P(e, 4) by Normal. Approximation through terms in 
s | quadrature | approximation tee, a TS 
} nt Le “2 a7? 


0.94452 0.94520 0.94398 | 0.94367 0.94355 | 0.94460 | 0.94410 

0.96404 | 0.96407 | 0.96460 | 0.96461 | | 0- 96454 | 0.96545 | 0.96500 

0.97820 0.97725 | 0.97894 | 0.97921 | 0. 97922 | 0.97978 | 0.97947 

0.98821 0.98610 | 0.98834 0.98878 | 0. 98886 | 0.98902 | 0.98889 

0.99482 0.99180 | 0.99412 0.99460 | 0.99472 | 0.99456 0. 99458 

0.99860 | 0.99534 | 0.99741 | 0.99782 | 0.99794 | 0.99763 0.99773 

| 0.99998 0.99744 | 0.99912 | 0.99940 | 0.99949 | 0.99916 | 0.99929 

2 | 1.00000 | 0.99769 0.99929 | 0.99955 | 0.99068 | 0.99081 | 0.99044 


NNN N Nee 
RZeeenoee | 


to 


for v = 0. The quantity on the right is easily evaluated, and we were able to 
evaluate the Bessel-function integral by quadratures for n = 5, 7, 8, and 9; it is 
infinite for n = 6. (8) is indeed negative for » = 5, 7, 8, and 9; thus the error 
achieves a local maximum. For n = 6, (8) is negatively infinite. 

Although convolution of the V-distribution is generally difficult, as Horner 
indicates, values in the tail of dPy(v, 4)/ dv and Py(v, 4) are fairly easily ob- 
tained with modern computing equipment. We were able to evaluate (6) on the 
Harvard Mark IV computer by means of Bronwin’s formula for numerical in- 
tegration (cf. Whittaker and Robinson [19], p. 159) and then to obtain 
dP y(v, 4)/ dv by quadrature. This operation was necessarily limited to the por- 
tion of dP y(v, 4)/ dv unaffected by the singularity of dP y(v, 2)/ dv—that is, the 
portion outside v = 2. Finally, hand integration of dP y(v, 4)/ dv produced values 
of Py(v, 4) for comparison with the series approximations in Table 3. Here, 
again, the first correction term effects a substantial improvement over the simple 
normal approximation, but additional terms contribute little. Note that the 
n~ term provides almost three-decimal accuracy. 


4. Normalization of the V-distribution. The method of Cornish and Fisher [2] 
(ef. Kendall [9], Secs. 6.32 and 6.33) provides the means of deriving an ap- 
proximately normal variate y (with unit variance) as a series expansion in 
z = v(2/n)"” and n. This series through n™~ is 


y = 2+ (2 — 32z)/16n + (712° — 2242° — 152)/4608n’ 
+ (3852’ — 13232° — 9812° + 15752)/73728n’. 


It may be reverted to give z as a function of y, with the following result for terms 
through n™: 


z= y — (y° — 3y)/16n — (17y° — 8y° — 177y)/4608n’ 
— (33y’ + 165y° — 1989y° + 999y)/73728n’. 


A series of this sort is of particular interest to statisticians for approximating 
percentage points of a distribution. Table 4 compares various approximations 
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TABLE 4 


Percentage points of P(v, 4) an several approximations 


eeana Approximation through terms in— 
approximation uunreeneas r none 
an 
.65010 1.64485 6524: 5 1.65496 
. 94862 1.95996 ; 1.93402 
. 24588 2.32635 . 23868 2. 97 2.22977 
.40712 2.57583 2.40886 


95 
975 

99 
995 
.999 


ww hw ee 


) 
. 63436 3.09023 2.7739 2.7196: 2.70274 


derived from (9) with percentage points of Py(v, 4). The latter were obtained by 
inverse interpolation of values of Py(v, 4), not all shown in Table 3. As with 
previous comparisons, the term in n’ provides a substantia! improvement over 
the simple normal approximation; the terms in n™” and n°, moreover, provide 
additional improvement for the extreme points Py = 0.995 and 0.999. Even 
with this improvement, however, accuracy is less than two decimals. 


5. Other possible series for approximating R- and \ -distributions. Bennett |1 
has proposed the use of Fourier-Bessel series for computing the distribution of R 
and Fourier sine series for V. We have not tested his claim that these series are 
more effective than quadratures. 


6. Conclusion. For very large n, Rayleigh’s approximation to the distribution 
of the vector sum or the normal approximation to the distribution of a component 
is clearly satisfactory. For very small n—less than 6 for the sum or less than 4 for 
a component-—-neither approximation is remotely satisfactory; but convolution is 
feasible even though laborious. For intermediate n, both the Rayleigh approx- 
imation for the sum and the normal approximation for a component can be sub- 
stantially improved by inclusion of a single term in n™'; and the additional 
computations are not excessive. 

The inclusion of terms beyond n™ appears, for the most part, not to be worth 
the trouble. For small n the improvement is hardly noticeable; for larger n the 
improvement may be appreciable, but it is probably not needed, since the n™ 
term will give fair accuracy by itself. However, the use of (9) to approximate 
percentage points in the extreme tail may provide an exception. 

As for accuracy, we believe that our series (7) through n™’ affords a most un- 
satisfactory approximation for n = 3, and a glance at Horner’s Fig. 7 should 
convince anyone. For n = 4, we believe the approximations are still short of 
satisfactory, and we suggest use of the exact values in Tables 3 and 4 whenever 
these are appropriate. For n = 5, the approximation should be substantially 
better than for n = 4; indeed, since dP (0, 5)/ dz is approximated to within 
0.00470 against 0.02613 for dP (0, 4)/ dz, and since P;(v, 4) is approximated to 
nearly three decimals in the tails, we surmise that Py(v, 5) is approximated to at 
least three decimals, possibly approaching four in the tail. 
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Note added in proof. We are embarrassed to find that the notation in this 
paper disagrees with that in [6]. A table of concordance follows: 


notation symbol in [6} symbol in this paper 
r?/n 2 z 
v(2/n)! not used z 
exponentially distributed transform of r?/n y not used 
normally distributed transform of v(2/n)* not used y 
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SUMS OF RANDOM PARTITIONS OF RANKS' 


By Joun W. TuKey 
Princeton University 


1. Summary. Suppose that the integers 1, 2,--- , N are randomly distrib- 
uted among k distinguishable classes with equal probability and without re- 
strictions. It is natural to denote the class sums by 8; , 8, +++ , s& and largest 
of these by S. A generating function is obtained for the upper half of the range 
of S, namely }N(N — 1) S S S 4N(N — 1). Fork & 6, this is shown to pro- 
vide the usual percentage points for N up to and beyond 10. Tables of 5% and 
1% points are provided for k = 2,3,6and N = 1(1)10. 

For k = 2, the distribution is that of Wilcoxon’s paired sample test [3]. This 
suggests the application of k = 6 to the six possible orders of three responses. 
This is a possible procedure but the peculiarities of its power are such that its 
use is not recommended. 

However, when three treatments with a natural order are examined in ran- 
domized blocks, a significance procedure can be based on the same distribution 
which is specifically sensitive to average responses in either exactly the same 
or exactly the opposite order as the treatments. 5% and 1% levels are given 
for N = 1(1)10. The procedure may be promising. 

The basic distribution used here is inappropriate for situations, such as 
analysis of variance of ranks, where the number of ranks in each class is re- 
stricted, as by being the same in all classes. 


2. Discussion. The announced significance levels are given in Table 1, both 
in terms of the rank sum for all but the weightiest class, and in terms of the 
largest rank sum. 

A set of observations on 3 treatments falls into one of 6 orders. Thus each 
block of a 3-treatment randomized block design falls into one of 6 classes. If 
we have assigned ranks to the blocks in some way that is independent of which 
treatment is which, we may regard the ranks as assigned to these 6 classes. If 
the complete null hypothesis is correct—if all three treatments are equivalent 
—then the ranks are assigned at random and the distribution applies. 

A significant result thus corresponds to either (i) an unlikely event, or (ii) a 
situation where at least one treatment differs from the other two. Such a test 
is a portmanteau test, and, compared to other three-sample tests, may be ex- 
pected to have somewhat better power when all three treatments differ notably, 
this increase being obtained at the expense of much decreased power when two 
out of the three treatments are equivalent. 


Received October 1, 1956, revised April 26, 1957. 

1 Based on Memorandum Report 40, Statistical Research Group, Princeton University, 
which was written while the author was a fellow of the John Simon Guggenheim Memoria! 
Foundation 
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TABLE 1 
5% and 1% significance levels for the sum of all other classes (out of k) but the 
weightiest (first entries, small values significant) and the largest sum of any one 


of k classes (last entries, large values significant) together with actual probability 
levels (in central parentheses) 





k=2 


) ( 

( : . , ‘ j . ( ). 
oe ie ( : ). 0(.027)6 
( 
{ 
{ 


) 


« ( 

( 
e 
- ( 


: 0(.037)10 ae 2 2(.051)8 0(.005) 10 
0(.062)15 ; 1(.037)14 | 0(.012)15 | 4(.055)11 2(.008) 13 
1 (.062)20 ‘ pi 3(.045)18 | 1(.012)20 | 6(.039)15 4(.009)17 
2 (.047 )26 of | De 5(.037)23 | 2(.007)26 10(.053)18 | 7(.011)21 
4(.039)32  0(.008)36 | 8(.044)28  4(.008)32 | 14(.045)22 10(.009)26 
6(.055)39 | 2(.012)43 | 11(.042)34 | 7(.010)38 20(.049)25 | 15(.008)30 
8(.049)47 3(.010)52 | 15(.045)40 | 10(.010)45  24(.043)31 | 19(.009)36 


Ce OoOnonrh WN 


_ 


All this will be true, whatever basis we choose for ranking the blocks. We 
will ameliorate the situation if we favor blocks in which all three treatments 
appear quite distinct—favor them by assigning them high ranks. A plausible 
choice is to rank blocks according to the least difference among the three re- 
sponses. 

As an example, consider data on mean head breadths of termites due orig- 
inally to Warren [2] and utilized by Tippett [1]. A portion of the data, where 
months are taken as blocks, is shown, and analyzed, in Table 2. The lowest 
rank sum for all other classes is 5, which does not reach the 5% level of 4. 

This technique is not, for the present, recommended for use. 


3. Ordered treatments. In the example just discussed, the order of nest 
numbers (presumably) was not expected to be related in any particular way to 
the order of the nest number averages. There are situations, however, where 
there is a natural order for the treatments, such that the treatment averages, 
if different, may be reasonably expected to fall either in the same order or in 
the opposite order. In such a situation, provided we agree to look only at (i) 
rank sums for classes not exactly in treatment order, or (ii) rank sums for classes 
just exactly opposite to treatment order, we can gain a factor of 3 in our signifi- 
cance calculations and may use the significance levels of Table 3. There is no 
need for us to retain the same system of ranking blocks. It now seems better 
to rank according to the least differences between responses to adjacent treatments. 

For an example of the use of this table, we may return to the same data, 
taking nests as blocks, and the months of March, May and November as the 
ordered treatments. The data and analysis are given in Table 4. The rank sum 
outside of the exactly opposite order is 0, and Table 3 shows that this is far 
bevond the 1 % level. (Reference to Table 6 shows it to be at the 0.03 % level.) 
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TABLE 2 


Analysis of part of Warren’s [2| data on mean head breadths of termites 
in mm. Nests as unordered treatments. Months as blocks. 


Nest Number (less 670) 
i Min. Diff. Rank 
4 


2.456 .009 
2.626 .079 
2.633 .063 
2.487 .042 
2.410 


Ww tw te te te 
. - 


Order Rank Sum 


Sum for Other Orders 


245 10 
425 5 
Other 4 0 


TABLE 3 
5% and 1% levels for the smallest rank sum for “A and other’ or “‘B and other’ 
(first entries, small values significant) when ranks are randomly allotted to A, 


to B, and to each of 4 other classes with equal probability (Actual probability 
levels in parentheses.) 











0(.009) 
1(.009) 
3(.011) 
6(.013) 
9(.009) 
13(.010) 
18(.009) 
23 (.011) 


It may well be that this sort of procedure for ordered treatments may prove 
useful. Further development, both of tables, and of methods of calculating 
tables, is likely to be required. 


4. Derivation. Obviously, no two of the class sums s; can both be greater 
than half the total. Thus, if m > }N(N + 1), the probability that S is greater 
than or equal to m is k times the probability that any one s; , say 8; , is greater 
than m. 

Write 

& = 3N(N + 1) — i, 


where 7 is at present unrestricted. Let a;.y be the rumber of arrangements of 
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TABLE 4 

Analysis of part of Warren’s [2] data on mean head breadths of termites in mm. 
Months as ordered treatments. Nests as blocks. (Scoring: + for Mar. < May < 
Aug.; — for Mar. > May > Aug.; 0 for any other order) 





Nest Number | Mar. | May ; Min. Diff. 


668 2.375 | .37 
670 2.613 | 55 
672 | 2.452 | 396 
674 2.515 | 2.445 
675 2.633 | 2.487 


2.373 
2.557 
2. 


Class | Remainder 


+ 15 
an 0 
0 15 


TABLE 5 
Values of bin (see Sect. 4) fork = 3 


4 5 6 7 


(Constant within rows) 


17 

25 27 

37 43 45 

49 59 65 67 

57 79 89 95 97 
9 65 99 121 131 137 139 
10 81 131 165 187 197 203 205 
ll Constant within 155 209 243 265 275 | 281 
12 columns) 179 265 319 353 375 | 385 
13 195 313 403 457 491 513 
14 211 369 499 589 | 643 677 
15 243 | 441 619 753 843 | 897 


kN-1 ¢ 81 | 243 729 | 2,187 | 6,561 | 19,683 


1, 2,--- , N into k distinguishable classes for which 7 has a given value. Then 
the generating function of 


t= & + at: + & 


N 


gn(x) = iain = Il fl+(k— 1)2’}, 


j=l 
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TABLE 6 
Values of bi.» (see Sect. 4) fork = 


4 & 6 oe ey 8 





(Constant within a row) 


| 126 
301 306 | 
476 506 | 511 
751 806 836 841 
11,026 | 1,231 1,286 1,316 1,321 
11,901 | 2,256 2,461 | 2,516; 2,546) 2,551 
(2,651 | 3,281 3,636 | 3,841 3,896 3,926 
3,401 | 4,906) 5,536 5,891 6,096 6,151 
4,026 | 6,406 7,936 8,566 8,921 9,126 
4,651 | $406 | 10,936 | 12,466 | 13,096) 13,451 
7,776 | 12,906 | 16,936 | 19,491 21,651 
(Constant within, a | 17,281 23,436 | 27,616 21,021 31,701 
column) 21,031 | 32,311 | 38,741 | 30,171 45,501 
| 24,781 | 41,186 53,491 42,946, 64,276 
27,906 | 52,409 | 70,589 | 60,071; 89,774 
31,031 | 63,061 | 90,741 83,169) 122,676 
46,656 | 88,686 | 128,366 | 157,821, 176,301 
| 111,186 | 165,866 | 208,696 231,176 
133 ,061 | 217,741 | 280,071) 314,676 
| 151,811 | 268,991 | 366,446 431,926 
| 170,561 | | 332,116 | 470,196 575,301 


6 36 | 216 |1 »296 | | 46,656 | 279,936 |1,679,616)10,077 696 
| 18 |108 648 '3, 888 | . 139,968 | 838,558 premiere 





where the second form is obtained by the following argument: The integer 7 may 
be placed in the first class in one way, and in one of the others in k — 1 ways. 
[ts contribution to 7 is zero in the first case and j in the others. This is repre- 
sented by the factor 1 + (k — 1)z’. 

The cumulative distribution of 7 has the generating function 


hy(x) = a = bina 


where, of course, 


N 
{@n-1(X), 
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so that 


hyx(2) = {] > (k ~~ 1)" }hy_s(z). 
Thus, 


biw = biwia t+ (k - 1)bi_ww-a. 


The probability that S is greater than or equal to 3N(N + 1) — 7 is, by our 
earlier argument, 


kb, N 


a —(N-)) 
EX = biwh 


so long asi < 3N(N + 1). 
In case only two of 6 classes are to be considered, we need only multiply by 
2 instead of 6, obtaining 


2b; — biw 
6” 306"-) 


5. Values of b;.x . We tabulate some values of };.~ for reference in Table 
5(k = 3) and Table 6(k = 6). 

Note that the values fori => N(N + 1)/4 must be calculated for convenient 
recursion, although they are not related to the actual problem. The recursive 
process used can be illustrated from Table 5, where 139 = 137 + 2(1), 203 = 
197 + 2(3), 275 = 265 + 2(5), 375 = 353 + 2(11), 491 = 457 + 2(17) in 
the next to last column, while 37 = 27 + 2(5), 49 = 27 + 2(11),57 = 27 + 
2(15), 65 = 27 + 2(19), 81 = 27 + 2(27) in the fourth column. 

REFERENCES 
{1) L. H. C. Tiprerr, The Methods of Statistics, 4th ed., John Wiley, New York, 1952, es- 
pecially pp. 182-183. 
[2} E. Warren, ‘“‘Some statistical observations on termites,’ Biometrika, Vol. 6 (1909), 
pp. 329-347. 


[3] Frank Wiicoxon, “Probability tables for individual comparisons by ranking methods,”’ 
Biometrics, Vol. 3 (1947), pp. 119-122. 





THE RELATIONSHIP ALGEBRA OF AN EXPERIMENTAL 
DESIGN 


By A. T. James 
Division of Mathematical Statistics, C.S.I R.O: 


0. Summary. Important properties of an experimental design, including the 
analysis of variance appropriate to it, are revealed by analysing the structure 
of an algebra generated by the relationships between the experimental units 
of the design. As an illustration, the relationship algebra of balanced incom- 
plete blocks is analysed in detail. 


1. Introduction. An experimental design consists of a set of N experimental 
units, which we shall call plots, classified.into subsets in various ways. They 
may be classified according to their position, as for example blocks in a ran- 
domized block, rows and columns in a latin square, or the classification may 
be based upon the treatments applied to the plots, or some other characteristics 
which certain plots share. 

Define a relationship, R, between the plots as a set of ordered pairs (i, 7) of 
them. If the ordered pair (7, 7) of plots belongs to R, we say that plot 7 is re- 
lated to plot 7 by the relationship R. In a randomized block design, for ex- 
ample, one may define that two plots in the same block bear the relationship, 
B, to each other, whilst two plots in different blocks do not. Likewise, a rela- 
tionship 7, meaning ‘“‘same treatment’’ can be defined. 


A relationship R among a set of N plots can be expressed as an N X N ma- 
trix of 0’s and L’s: 


if ¢ is related to 7 by the relationship R 


\0 otherwise. 


The relationship matriz (r;;) will also be denoted by the letter R. 

There are two relationships which appear in any design: (1) the identity re- 
lationship of each plot to itself and (2) the universal relationship which relates 
each plot to every plot in the design. The identity relationship corresponds to 
the matrix J and the universal relationship to a matrix G, all of whose elements 
are unity. 

The matrix product RS of two relationship matrices R and S, gives a matrix 
whose elements have values 0, 1, 2, 3, --- . It can be interpreted in terms of 


Received January 4, 1957. 
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derived relations. The (7, k)-th element of RS is the number, p, of ways one 
can combine a relation (7, 7) of R with a relation (j, k) of S to connect 7 and k. 


ys 


ee: 


More generally, the (i, u)-th element of a product RST --- Z is the number 
of paths leading from the ith plot to the uth plot via each of the relations R, 
S, T, --- , Z successively. 


cf. Kendall [2], pp. 49, et seq. 
Under the operations of matrix multiplication, matrix addition and scalar 
multiplication, the relationship matrices generate an associative algebra, which 


we shall call the “relationship algebra of the experimental design.” 

From the relationship matrices and all their products, a set of linearly in- 
dependent matrices, R, S, T, --- , Z, can be chosen such that all the matrices 
of the algebra can be expressed as linear combinations, 


AR + wS+---, 


of them. The set R, S, T,--- , Z, is called a “basis” of the algebra. Since the 
product of any two matrices of the algebra can be calculated from the products 
of the basis matrices, a multiplication table for the basis matrices summarizes 
the algebra. 


For example, the relationship algebra of a randomized block has 4 basis 
matrices J, B, T’, G whose elements are 


ifi =j 


otherwise, 


if ¢ and j are in the same block 


otherwise, 
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if i and j have the same treatment 
bes 
otherwise, 
gy = 1. 
The multiplication table is 


i282 Ws 

B tB G Ct sb blocks 

T G obT 0vb@_  t treatments 
G tG bG Db. 


Each basis matrix is written, once in the first row and once in the first column. 
The product of the matrix at the beginning of a row, with the matrix at the 
top of a column is written down in that row and that column. 
The algebra is the direct product of two subalgebras: 
i a eaten 


B tw * T oT. 


For an n X n latin square, the multiplication table of the basis matrices is 


i ee ee 

RnR G G nn 

CO we C's 

7: .@ @..00 46 

G nG nG nG WG. 
R, C, T, are the relations, same row, same column, and same treatment respec- 
tively. 

Balanced incomplete blocks have an interesting algebra. The two relations, 

same block B, same treatment 7’, satisfy the equation 


(1) TBT = dG + (r — A)T, 


which reflects the requirement of balance, namely, that each pair of treatments 
occur together in \ blocks. r is the number of replicates and k the number of 
plots per block. 

One can verify Eq. (1) by counting the number of paths from plot 7 to plot 7 
using the relations 7’, B and T' successively. If i and j have different treatments 
there are exactly \ blocks containing both these treatments, through which 
the connection can be established. Therefore there are \ paths and the (ij)-th 
element of TBT is . If i and j have the same treatment, the number of paths 
is clearly r and the (1j)-th element of TBT has this value. Thus Eq. (1) holds. 

With this equation one can work out the multiplication table (Table 1). 

As would be expected, the algebra of relationships corresponds to the nu- 
merical operations involved in the analysis of the design. If the observations 
1, %,-°-:* , ty taken on the N plots respectively, are written as a column vec- 
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TABLE 1 





| BT TB | BTB 
| krG | krG | *rG 
| kBT | BTB | kBTB 

| xAG + (r — d)T| rTB AG + (r — A)TB 

AKG rBTB | ARG 
+ (r — »X)BT + (r—r)BTB 

AG AKG AKG AG 
+(r—r»)T| +(r—dAKT| +(r—dA)TB) + (r — AYKTB 

kBTB kG AKG ARG ARG 
+ (r —d)BT| + (rf — A)KBT| + (fr — X)BTB + (r — NNKBTB. 


tor x, then multiplication by the relationship matrices gives linear transforma- 
tions of x; e.g., for a block design, the transformation 


x— Br 


is the operation of replacing each value z; by the total for the block in which 
the ith plot occurs. Similarly, viewed in this way, the matrices T and G are 
operators which replace each observation x; by the treatment total or the grand 
total respectively. 

When the relationship matrices are thus considered as operators, their prod- 
ucts are often obvious—e.g., for the randomized block, clearly, 


BT = TB = G. 


2. Structure of the relationship algebra. If j bears the same relation to 7 as 
i bears to j, as will usually be the case, the relationship matrices will be sym- 
metric. Note, however, that their products will not necessarily be so; as can 
be seen in the case of balanced incomplete blocks where 


(TB) = BT’ = BT # TB. 


The fact that the algebra can be generated by symmetric matrices has a very 
important implication in its mathematical analysis. 

Let % be the vector space of column vectors, z. A subspace &, is invariant 
under the relationship algebra 4M; i.e., AW, C B, , if and only if it is invariant 
under the relationship matrices which generate M%. But these are symmetric; 
hence the orthogonal complement, %; of %:, is also invariant under them. 
Hence % is invariant under %; i.e., % is a completely reducible set of linear 
transformations of %. Therefore % is a semi-simple algebra. According to a 
theorem of Wedderburn, a semi-simple algebra is isomorphic to a direct sum of 
complete matrix algebras’ (see Van der Waerden [3], Chap. XVI). 

Hence, an algebra generated by symmetric relations is isomorphic to a direct 
sum of complete matrix algebras. 


2 It may be necessary to extend the field of scalars from the real numbers to the complez 
numbers. 
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Example 1. The randomized block. As one can see by inspection of the multi- 
plication table, the algebra is commutative. Now the algebra of all 2 K 2 ma- 
trices or of matrices of higher order is not commutative. Hence the algebra of 
the randomized block cannot be isomorphic to a direct sum cof matrix algebras 
which contains one of these. Thus it must be isomorphic to a direct sum of 
1 X 1 matrix algebras; i.e., the algebra of the randomized block is isomorphic 
to the algebra of all diagonal 4 X 4 matrices, 


* 


* 


Example 2. The latin square. As the algebra is commutative and 5-dimen- 
sional, it is isomorphic to the algebra of all diagonal 5 5 matrices. 

Example 3. Balanced incomplete blocks. The algebra is 7-dimensional and non- 
commutative. Since it is noncommutative, the direct sum of complete matrix 
algebras to which it is isomorphic must include a complete matrix algebra of 
order at least 2 X 2, but not more than 2 X 2, because a complete 3 X 3 
matrix algebra is 9-dimensional and our algebra has only 7 dimensions. Hence 
the algebra of balanced incomplete blocks is isomorphic to the algebra of all 
matrices of the form 


3. Analysis of the relationship algebra. A direct sum of k matrix algebras 
can be decomposed into its k component parts; e.g., for k = 2, 
- 00 0 
° 00 0 
+;0 0 0 
0 0 
0 0 


, 


0 
0 0 


is a minimum two-sided ideal. The product of two matrices belonging to different 
parts is clearly zero; i.e., the different parts annihilate each other. Correspond- 
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ingly, the relationship algebra Y%, being isomorphic to a direct sum of k com- 
plete matrix algebras, can be expressed as a direct sum of minimum two-sided 
ideals which annihilate each other: 


(2) HM=%H 4% 4---4 HM; 


i.e. any element of 9 can be expressed uniquely as a sum of elements belonging 
respectively to %, , --: ,%. 
In particular, the identity element can be so expressed 


(3) T= 4, +8+:---+ kh, 


and the components E; will be idempotent. Writing the corresponding quadratic 


forms, we have the decomposition of the sum of squares for the analysis of vari- 
ance 


(4) Zz a= a'r = PE t+ o/'Bo + +e) + Eye. 


So far, the decomposition is unique. If one of the ideals—e.g., %,;—is isomor- 
phic to an r X r matrix algebra, then z’EZ,x can be further decomposed into r 
parts, each on the same number of degrees of freedom, but the decomposition 
is not unique. The example of balanced incomplete blocks will illustrate this. 


4. The analysis of the relationship algebra of the balanced incomplete block 
design. The algebra can be analysed by the standard procedures. For purposes 
of illustration the method is given in detail. 

The problem is to decompose the algebra into its minimum two-sided ideals 
and to find the corresponding principal idempotents. These are the unit elements 
of the ideals. Since. as we have seen, our algebra is isomorphic to the direct 
sum of three 1 X 1 matrix algebras and a 2 X 2 matrix algebra, there must be 
three 1-dimensional two-sided ideals and one 4-dimensional two-sided ideal, 
which are respectively isomorphic to the matrix algebras. FE, , E,, E; , Ey will 
denote the respective principal idempotents. Our first step is to pick out these 
ideals. 

The multiples of G form a 1-dimensional two-sided ideal whose idempotent 


i 1 a : jae s , 
is Ky = = G. The corresponding sum of squares, x’E,2 is just the correction fac- 


tor, (grand total)?/n. Let us consider the algebra modulo G. We can take care of 
G later on by replacing all sums of squares by the corresponding sums of squares 
about the mean. There is now a 6-dimensional algebra to be analysed. Its multi- 
plication table is obtained by putting G = 0 in the original multiplication table. 
We must look for some more two-sided ideals. 

The linear combinations of the basis elements containing T—namely, 7’, BT, 
1B, BTB—form a two-sided ideal, because all multiples of these elements 
again contain 7’. This must be the 4-dimensional ideal that we are seeking. The 
principal idempotent, ,, of this ideal corresponds to the unit element of the 
2 X 2 matrix algebra to which the ideal is isomorphic. 

One can set up such a correspondence by finding the left-regular representation 
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—namely by considering how 7, BT, TB, BTB are transformed by left multi- 
plication by each of them in turn. 
(7 BY TB BTB ble (7 TBT TB TBTB\ 
(r—r) O 
0 0 
0 r 
0 0 0 


=|7T BT TB BTB\ 


The row [7 BT TB BTB) is written formally as a row vector, even though the 
elements belong to an algebra instead of being numbers. This “vector” and the 
matrix are to be multiplied by row-column multiplication. Similarly, 


'T BT TB BrB) —2. (T BY TB BTB) 
i) 


Each 4 X 4 matrix is the direct sum of two identical 2 X 2 matrices, as is to 
be expected in a regular representation. Hence we can set up an isomorphism 
between the ideal and the 2 X 2 matrices: 


r r—A 
r| 0 |. 


Although B does not belong to the ideal, we can calculate the matrices isomor- 


phic to the elements BT, TB, BTB, which do belong to the ideal, by using the 


map 


0 0 
»—[° 9] 
a 0 0 
Br« ° PR 


ol r—>r k(r —X) 
rB<|" 5 0 | 


0 0 
BTB >| ape! hed i 


To find the principal idempotent E, of the ideal, we must express the matrix 
corresponding to it, namely the unit matrix, in terms of these matrices: 


lo i" asae 
0 1 (k— 1)r+ 


gine SD ‘cd 0 0 _fO 0 ]_fr—-a kr-a] 
| LO 0 r—rLr— X kr —d) rr—y 0 0 r 
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Hence, 
Ey, = (kT + r(r — d)"BTB — BT — TB)/((k — 1)r +X). mod G. 


Having dealt with the 1-dimensional ideal generated by G and the 4-dimen- 
sional ideal, we now have to find the idempotents of the other two 1-dimensional 
ideals. We can obtain an algebra isomorphic to the direct sum of these two- 
sided ideals by taking the whole algebra modulo G, T. If we putG = 0, T = 0, 
the multiplication table reduces to 


I B 
B kB. 


This 2-dimensional algebra splits into two 1-dimensional ideals whose idem- 
potents are k*B and IJ — k'B. But, modulo G, 7, the algebra is generated 
by the two idempotents F, and E; ; hence we can put 

E, = kB 

mod G, T.. 
E;=I-—k'B 
Dropping the modulo 7’, we may write 
k"B = E.+ F, 

: mod G, 
I —k B = E; + F; 


where F, and F; are the components of kB and J — kB belonging to the 
4-dimensional two-sided ideal, which was mapped on zero when we worked 
modulo 7 by putting 7 = 0. Thus, modulo G, 


F. = F.E, = k BE, = {k(r — »)J'BTB, 
F; = F;E, = (I — k"B)E, = [((k — 1)r + AY '(kT + k“BTB — BT — TB) 
k((k — 1) + ar) (T — k“BT)(T — k‘TB), 
since £E.E, = 0 and £;E, = 0. Therefore, 
E, = k"B — F, = kB — [k(r — \)|'BTB 
E; = (I — k'B) — F; = I -— E, — E, 


Now we have the principal idempotents of all the ideals. 
At the same time, we obtain a further decomposition of FE, : 


E, = (k"B + (I — k"B))Ey 
(E, + F2)Ey + (Es + F3)Ey 
= F,+ F;. 
F, and F; are idempotents but not principal idempotents, because, unlike E,, 


they do not correspond to the unit matrix , of the 2 X 2 matrix algebra, 


1 
"LO 1 
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Blocks (ignoring treatments) 
Treatment component ; | |k(r — \)"BTB 
Remainder ‘ dts wads ; kB — |k(r — »)}'BTB 
Total ' ' kB 
Treatments (eliminating blocks) . k[(k — 1)r? + ar} U(T — kBT)(T — k“TB) 


Intra-block error. . .. by difference 


Total a ; ; , ; I 


but, in the appropriate isomorphism, they correspond to the idempotent matrices 


1 
but nevertheless it is apprepriate, as may be seen by putting the usual inter- 
pretations on the quantities. 
The idempotents may be arranged as in Table 2, all of them modulo G. 
The quadratic forms, of which the idempotents are the matrices, are the sums 
of squares in the usual analysis of variance as given in Fisher and Yates [1]. 


‘a 0| and S 4 respectively. This part of the decomposition is not unique, 


5. Conclusion. Whilst it is too early to see the full implications of the rela- 
tionship algebra, the following points may be noted: 

1. Anyone investigating or proposing a new experimental design can throw 
considerable light upon it by enumerating the basic relationships set up by the 
design, and analysing the algebra they generate. 

2. The relationship algebra leads to a simple and natural notation for the 
component sums of squares appearing in an analysis of variance. The sums of 
squares are specified by their matrices. The table above for balanced incomplete 
blocks, illustrates this point. 

3. The analysis of variance corresponding to the analysis of the algebra into 
its minimum two-sided ideals, is unique. If the minimum two-sided ideals are 
one dimensional, no further decomposition is possible. More precisely, the re- 
lationships which have generated the algebra will not resolve the sums of squares 
beyond this point. 

All minimum two-sided ideals are one dimensional if and only if the algebra 
is commutative. Designs possessing such an algebra have a unique analysis 
whose components are automatically orthogonal. The randomized block and 
latin square are of this type. 

4. When the algebra contains a minimum two-sided ideal isomorphic with 
an m X m matrix algebra, the sums of squares corresponding to that two-sided 
ideal can be decomposed into m components each on the same number of de- 
grees of freedom; but the decomposition is not unique. However, the system 
of possible decompositions is delimited by the fact that a transformation from 
one decomposition to another, induces an automorphism of the algebra. This 
point deserves a more detailed treatment than I can give at the moment. 

5. For certain designs, the relationship algebra is the commutator algebra 
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of the representation of a group expressing the symmetry of the experimental 
design. Such will be the subject of a further paper. 


6. Acknowledgment. The idea of classifying the pairs of plots according to 
the relationships between them was suggested to the author by Wilkinson [4], 
who introduced it in connection with his work on missing values. 
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ESTIMATES OF ERROR FOR TWO MODIFICATIONS OF THE 
ROBBINS-MONRO STOCHASTIC APPROXIMATION 
PROCESS' 


By H. D. Biock 
Cornell University 


1. Introduction. The Robbins-Monro procedure is a process of the following 
form. For each number z, Y, is a chance variable having a variance which is a 
bounded function of z; i.e., E(Y, — E(Y.))* S o’ < «. The regression curve 
y = f(x) = E(Y.) is presumed to be unknown but supposed to lie below the 
horizontal line y = a for r < @ and above it for x > 6, where a is specified and 
6 is to be estimated. Let 
(1) Xavi = Xn — a,(Yz, — a) (n=1,2, -- -), 


o> 


where the a, are specified numbers and assume that E(X, — 6)” = V* < =; 
then under suitable conditions X, converges to 6. Following the paper of Robbins 
and Monro [10] there appeared a succession of papers ({1] through [14]) in which 
the conditions for convergence were relaxed, the type of convergence strength- 
ened, the asymptotic distribution of X, found and the whole process generalized 
and simplified. The question of an optimal stopping rule however remains open. 
We assume here that the regression line lies between two straight lines with 
finite and positive slopes, i.e., 


a+m(x— 6S f(x) S a+ M(x — 86), ifzx = @, 


a+ m(x — 6) 2 f(z) 2a+ M(x — 8), if x 6, 


with 0 <m SM < ~. With this condition Dvoretzky [6] showed how to 
choose the a, to minimize a bound on the error E(X,4;: — 6)’ after a fixed num- 
ber N of observations Yx,,--: , Yx,. Here we give analogous results for two 
modifications of the Robbins-Munro procedure: (i) instead of taking one ob- 
servation at X, one takes several and uses the average instead of Y,, in (1), i.e., 


: yOu... 4 yin 
Xegi = Xe — (== —_—_____*- - a}, 


ni 


the idea being that it may cost less to take several observations at one point 
than the same number of observations at different points; and (ii) using (3) 
with a, = a (k = 1, 2, ---); the object being simplicity in performing the ex- 
periment 
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Clearly with the n, as well as a, at our disposal in (i) we can get E(X,41 — 6)’, 
where 
p 
(4) > m = N, 
k=nl 
with (3) at least as small as E(Xy — 6)? by (1). We shall see that we can’t do 
any better with (i) either, so the only saving in (i) is in the smaller number of 
“set-ups” required. On the other hand we cannot expect to do better with (ii) 
but we shall see that under certain conditions we can do about as well, so that 
the increase in simplicity may be worthwhile. 
The notations and assumptions introduced above will be used throughout the 
remainder of this paper. 


The author is indebted to Professors J. Kiefer and J. Wolfowitz for introducing 
him to this subject. 


2. The Robbins-Monro process. For the sake of completeness and compari- 
son we give first the Robbins-Monro case already handled by Dvoretzky [6.] 
THEOREM (1). Assume that 
20° 


72 < - 
) Vs m(M — m)° 
If 


(8) a, = = 


o + mV2n’ 


then with X,, given by (1), 


72 2 
2 V'o 


(y) E(Xyy - 6) S 24 VN? and 
(6). . the constants a,, given by (8) are optimal in the sense that if the a, do not satisfy 
(8) then there exist processes (1) satisfying (a) for which (y) does not hold. 
RemMakRkK 1. If the condition (a) is not satisfied it is still not difficult to find the 
optimal a,,’s from the derivation below (see Dvoretzky [6]). In this case the esti- 
mate of error is not so neat; if we take instead a, = 2/[M + (2n — 1)m) it 
is not hard to verify that 


= ; 2 _ (M — m)’V’ + 4No° 
Men ~~ @ & Serra . 
(5) tx — ) S “ON — Imi 
Proor. Let Yx, = f(X,) + €,., where E(e,| 2, 
- ,2,) S o°. Then from (1) 


Xnui — 6 = X, — 0 — a,(f(Xn) — a) — nen 
= (1 — a,g(X,)|(Xn — 6) — Quen, 
where f(z) — a = g(x)(x — 6) and in virtue of (2), g(x) satisfies: 0 < m S 
giz) <M. E(Xas: — 6) S El(1 — ang(Xn))*(X» — 6)7] + ato’. Now 


1 — a,g(X,) S$ 1 — a,m 
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and since from (a) and (8) a, S a; S 2/(M + m) we also have a,g(X,) — 1S 
a,M — 1 <= 1 — a,m. Hence 
(6) E(Xayi — 8)" S (1 — aym)E(X, — 8)’ + ago’, 
or using (8) and then iterating from n = N downward 

aan 2 — lo + (N — 1)m’V*PE(Xy — 0)? + om’ V* 
4 . am ences eediintnanianatiindidad titeaatintendanienht einai etka Riaeamieieta hd 
E(Xn — 0 = (o? + NmV?)? 


[o* + (N — 2)m'V*PE(Xy1 — 9)" + Qo'm'V* 
(o? + Nm*V?)? ™ 


V4" 


S een 
=o + mvVN 


To verify (8) let g(x) = mand var e, = o°. Then (6) becomes an equality and the 
unique minimizing a,’s are given by (8); to see this let E(X, — 0)” = B, and 
note that from (6), (with n = N), having fixed a; , --- , ay_: , the value of By is 
fixed and By,; = (1 — aym)’By + ayo’. The minimizing value of ay satisfies 
Gy = mBy / (co + m’By), and the minimum value of By; satisfies By. = 
(o' By) / (o + m’ By) or 

Nm 


e° 


Hence 


with equality holding if and only if the a, satisfy (@). 
ReMARK 2. It is of some interest to compare the error for the Robbins-Monro 
procedure 


s 


a ee 
vi + m'N 


with the error of the maximum likelihood estimator of the special situation when 
the regression is known to be linear with known slope m with errors normal 
(0, o°). If we take N values for 1:X,, X2, --- Xw by a procedure which is in- 
dependent of 6, and observe Y, = f(X,) + «. = a+ m(X, — 0) + |, 
the joint density function of X,,---,X,, Yi,---, Yais 


1 ~~) 5 y.—e—m (a n—0))? 
——} e & dF (x, +++, 2n), 

V 2ee 
where ¢ is known and 6 is to be estimated. The maximum likelihood estimator is 
6 = X — (1/m)(Y — a) which is distributed normally (6, o? /m’N). Now 
o / {(o?/V?) + m'N] < o'/m'N so that as long as V’ < «© the Robbins Monro 
estimator is better. The condition V* < © is guaranteed, e.g., by taking X, = 0. 
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3. The averaged Robbins-Monro process. 
THEOREM 2. Let 


9 


20 


) a(M — m)V® 


=Q21, 


and let nm, , ne, °°: , Np be positive integers whose sum is N and such that 


(ii) m <Q, ~ y Fe - 
J -— 


<Q+ 2m(ni + ma) oY xe 2m(n + °°: + Mp1) 
M-—m ght M—~m 


If 


ea mn,V* 
(iii) Qj, = — . a (k = 


a? + mV2(m + m2 + >> + m) im 


and the X, are given by Eq. (3), then 


(iv) E[Xpu — OF S PN: and 

(v) the constants a, are optimal in the sense that if the a, do not satisfy (iii) then 
there exists a process (3) satisfying (i) and (ii) for which (iv) does not hold. 

Remark 3. Again if (i) is not satisfied it is not hard to find the optimal a,’s 
from the derivation below. Of course we can always achieve the estimate (5), 
with the a,’s chosen as in Remark 1. 

Proor. From Eq. (3) 


Xia — 0 = [1 — ae g(X,) (Xi — 0) — — (ef? + --- + «f*”) 
k 


22 
E(Xis — 0)? S Ef{{t — ag(Xi) P(X, — 0)?} + 4°; 
k 
again 1 — ag(X.) S 1 — am and a S 2/(M + m) so that 


23 
—_ 2 rer a 
(7) E(Xus1 — 6) S (1 — aem)*E(X, — 0)? + =~. 
k 
Using (iii) and iterating from k = p down, one finds (iv). Again with g(z) = m 
and var (¢{”) = o’, inequality (7) (with k = p) becomes an equality which is 
minimized when a, = (mB,n,) / (o + m’B,n,) and 


2 
B,o 


B 7° Tr” 
" o + m*Byn, 


or 
Py 8 wee. - fe m'(n + --: + n,) 
Boy B, a B, e 


ie., Bpya 2 (0° V") / (o° + m’V°N), and the equality is achieved if and only if 
the a, are given by (iii). 
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4. A fixed value for the a, . Here X; is given by (3) with 


a, = a(k = 1,2,---,p). 
THeEoreM 3. Lel 


20° 


(s —___—______ 
s) m(M — m)V? 





: ‘ Se bs 1 
oe +5 Nae =) ‘7 


Let p be any integer satisfying 


8 Sad 
log (1 on A =) 


Gp oem So SW. 
* = + ) 
e\M —m 
It follows easily from (a) that at least one such p exists. 
If 


and if the equations 


Ss y2_ 2ar 
Om =t- ap? 2+ Vim N)a (k = 1,2---. 


V2m 
define m , +--+ , Np as integers, then 


722 
Vo 


> W(X pee ee 
(©) E(Xpn — 0° S$ Say, 


and 

(f) for a fixed value of p satisfying (b) this choice of a, mn, , «++ , Np is optimal in 
the sense that if (d) defines integers but n, are chosen satisfying (4) but not condition 
(d), or if a does not satisfy (c) then there exists a process (3) with the a,’s equal, 
for which (a) and (b) are satisfied but (e) does not hold. 

Remark 4. The condition (a) here is less stringent than the corresponding 
conditions (a), (i) in the preceding theorems; but the assumption that (d) de- 
fines integers is of course unpleasant; if (d) does not give integers, but one 
chooses the nm as the nearest integers to them then one would expect that the 
estimate (e) would not be very much in error, especially if N is large. We have 
not done the computations for that case. 

PROOF. 


» jim (np) 
(Xiga — @) = (L — ag(X+))(Xz — 0) — jet ts" |, 
k 


22 
E(Xiw. — 0)? S E(1 — ag(X,))*(X; — 0)? + ~. 
; 
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Again 1 — ag(X,) S 1 — ma and from (a) and (b) it follows that 
a Ss 2/(M + m) 
so that ag(X,) — 1 S aM — 1 S 1 — ma. Hence 
E(Xias — 09 S (1 — may E(X, — 0) + (a’o*) / nm. 


Iterating we get 


E(X,u— 0) s — ma)” g — ma)"E(X,1 — 0)" + ee] + - 


Np-1 Ny 


(1 — ma)’ . (1 — ma)‘ 


— ma)’V’ +ae E + 


p Np-1 Np-—2 


ee i me |. 


ny 


Using (c), (d), we now get (e). The argument for (f) follows as before; namely, 
if g(x) = m, var (€;”) = o°, then the equality in (8) holds, and the unique 
minimum, subject to the constraint (4), occurs when a, m satisfy (c), (d). To 
see this let r = 1 — ma so that 

oti \2 2p {72 Ll —r)*o 

E(Xpu — 8) = r?V + ( tT 


m* 
l 
N—-m—nme-—-::- 


aE(K pun — 0 _ 
On 


(9) 


From 


we get 


(10) 


Hence 


and 


(11) 
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Using (11) in (10) we get m = [r’ “(1 — r)N]/(1 — 1”), k 
p — 1. Using this in (9) we get 


/ a Pp =~ * 2 p 
EX — Ot = Pvt 4 LP = Do? 


m?>N ken} 


1 —r’”) o 


af" VY += ro 


This has its minimum when r” = a’ / (Nm’V* + o°). The corresponding values 
of a, m are given by (c), (d), and so E(X,4; — 0)’ = (0 V’) / (o° + Nm’V’) 
with equality holding if and only if (c) and (d) are satisfied. 

Remark 5. The estimate of error in each theorem ((y), (iv), (e)) is the same. 
In Theorem 2 it is independent of how the N observations are partitioned 
amongst m,---, Mp, as long as (ii) is satisfied. In Theorem 3 this partition 
affects the estimate while the choice of p (subject to condition (a)) does not. 
If it costs less to take several observations at the same point than the same num- 
ber of observations at various points then one will keep p as small as possible 
in each case. 

Of course with these estimates of error we can at once give confidence intervals 
for 6, via Tchebycheff’s inequality. 

REMARK 6. Similar estimates can be made for a process taking place in a Hil- 
bert space; for example suppose we wish to find a solution @ to the equation 
K(@) = a where K is an operator satisfying ||K(x) — al!’ < Cl\x — 0)!’, 
C < ~, (K(x) — K(6), x — 6) = e\la — @\*, c > O. (For example, a positive 
definite continuous linear operator has this property; cf. Blum [2]). If X,4; = 
X, — a,(K(X,) — a + e), where e, is a vector; and if & is an additive and 
homogeneous function such that &(e,, g(X,)) = 0 for any vector function 
g(X) and &(|\e, *) < o’, then 


||Xasi — ll? S (1 — anc + anC)é||X, — 0!" + aro’. 


The optimal a,’s and best estimates of error are now obtained from the recur- 
sions a, = (cB,) / (CB, + o°), Busi = (1 — 2a,c + a,C)B, + aio’. The modi- 
fied procedures may be treated similarly; now, however, it turns out that the 
estimate of error in case (i) does depend on the partition (m ,--- , n,) of N. 
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ON THE IDENTIFIABILITY PROBLEM FOR FUNCTIONS 
OF FINITE MARKOV CHAINS 


By Davin BLacKWELL AND LAMBERT KoopMANs 
University of California, Berkeley 
1. Summary. Let M = || m;; || be a 4 X 4 irreducible aperiodic Markov 
matrix such that h; # he, hs # hy, where hi = ma + me. Let a, m,°--° 
be a stationary Markov process with transition matrix M, and let y, = 0 when 
tn = lor2,y, = 1 whenz, = 3 or 4. For any finite sequence s = (€ , &, - 
én) of 0’s and 1’s, let p(s) = Pr{fyi = a,---, Yn = ‘iar. 


, 


(1 p (00) ¥ p(0)p(000) and p'(01) ¥ p(1)p(010), 


the joint distribution of y , yz, --- is uniquely determined by the eight proba- 
bilities p(0), p(00), p(000), p(010), p(0000), p(0010), p(0100), p(0110), so that 
two matrices M determine the same joint distribution of y; , yz, --- whenever 
the eight probabilities listed agree, provided (1) is satisfied. The method con- 
sists in showing that the function p satisfies the recurrence relation 


(2 p(s, ¢, 8,0) = p(s, €, O)a(e, 8) + p(s, &)b(e, 8) 


for all s and « = 0 or 1, 6 = O or 1, where a(e, 4), b(e, 6) are (easily computed) 
functions of M, and noting that, if (1) is satisfied, a(e, 6) and b(e, 6) are deter- 
mined by the eight probabilities listed. The class of doubly stochastic matrices 
yielding the same joint distribution for y; , yz, --- is described somewhat more 
explicitly, and the case of a larger number of states is considered briefly. 


2. Introduction. Suppose a certain process is known to be a stationary 
Markov process with N states, say 1, 2, --- , N, and unknown transition matrix 
M, supposed irreducible and aperiodic. To what extent can we identify M by 
successive observations on the process, if by observation we are unable to dis- 
tinguish between certain states of the process? More precisely, if {X,} is a sta- 
tionary Markov process with states 1, 2,--- , N and N X N irreducible aperi- 
odic transition matrix M, and y, = $¢(X,), call two such M’s equivalent (for 
the given ¢) if they determine the same joint distribution of y , y.,--- . Call 
a finite set of functions f; , --- , f; , each defined on the set of all N X N irre- 
ducible aperiodic Markov matrices, a complete set of invariants if M, and M, are 
equivalent if and only if f;(M,) = f:(M:2) fori = 1, --- , k. Our problem is that 
of finding a minimal complete set of invariants, i.e., a complete set of which no 
proper subset is complete. We do not solve this problem, even in special cases, 
but almost solve it in the two special cases (a) @ has only two values, one of which 
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is assumed at only a single state and (b) N = 4, ¢@ has two values, each assumed 
on two states. By almost solving the problem, we mean the following. Call a 
set of functions fi, --- , f, a complete set of invariants relative to a set X of 
matrices if (1) X is a union of equivalence classes, (2) f;(M/,) = f:(M-) implies 
M;, is equivalent to M,, (3) if M, and M, are equivalent and in X, f,(M,)= 
f:(M_). Thus a complete set of invariants relative to X fails to be a complete set 
only because two matrices M,, Mz not in X may be equivalent even though 
f(M,) # f(M,). In the two special cases above, we find a complete set of in- 
variants relative to a set X containing most matrices. 

For case (a) the solution, following the methods of Feller [1], is straightforward. 
Say ¢@ assumes the value 0 on state 1, the value 1 on all other states. The joint 
distribution of y: , ye, --- determines and is determined by the distribution of 
return times to state 1, i.e., by the sequence of numbers 


a, = Prizg.a, = 1, 4;# lfor2sSjsn|x = 1}, 


which determines and is determined by its generating function A,(t) = >>? ant”. 
Define 


A) = DOPrfaui1=1, 2 #1lfor2sjs 
1 


Then the functions A; ,7 = 1, --- , N satisfy the system 


A(t) = | ms a Zz mA, |, i=1,-:--, 


Cramer’s rule yields 


A,(t) = 1 — (det(J — tM) / det(J — tM))), 


where J, J are the N X N and (N — 1) X (N — 1) identity matrices and M, 
is obtained from M by deleting the first row and column. Thus two matrices are 
equivalent whenever t.iey determine the same polynomials P(t) = det(J — tM) 
and Q(t) = det(J — tM,) and, if for a given M these polynomials have no 
common roots, a second M is equivalent if and only if it has the same P and Q. 
Thus, on the class X of matrices for which P and Q have no common roots, the 
coefficients of P and Q are a complete set of invariants. That two matrices not in 
X may be equivalent even though the polynomials P, Q differ is shown by the 
example 


M = 
| 
~ | 
All choices of 2, y, z, 0 < 2, y, z /2 lead to equivalent M’s, while P, Q do 
depend on 2, y, z. 


3. The case (2, 2). Suppose NV 4 and that ¢@ assumes two values, each on 
two states. Say (1) = ¢(2) = 0; $(3) = (4) = 1. Let hi = mat+me, 
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f= 1,-+-, 4; we assume h, + hz, h; # hy. For any finite sequence s = («, 
, €x) Of O’s and 1’s, let p(s) = Pr{m,--- , yn) = 8}. We shall prove 
THEorEM 1. The function p satisfies 


p(s, ¢, 5,0) = p(s, e, O)ale, 5) + p(s, e)b(e, 4) 
for all s and ¢ = 0 or 1, 6 = 0 or 1, where 
a(O, 8) = (Pr{(yz, ys) = (6,0)| a, = 1} — Prf(ye, ys) = (6,0) | a, = ¢ 
(hi — he), 
a(1, 6) = (Pr{(ye, ys) = (6, 0) | a; 3} — Prf(ye, ys) = (6, 0) 
(hg — hy), 
b(0, 8) = (hy Pr{ (ye, ys) = (6,0) | a1 = 2} — he Pr{ (ye, ys) = (6,0) 
(hy; — he), 
b(1, 6) = (hs Pr} (ye, ys) (6,0) | x; = 4} — Ag Pr} (ye, ys) = (6,0) 
(hz — hy). 


Proor. For any s and any i = 1, --- , 4, let g(s, 4) = Prf{(m,--- 
Patt = 4). Then 


p(s, €, 6,0) = > q(s, tm; h;. 


@(1)=<« 
@(j)=3 


Fix ¢ and denote by 7* the state different from 7 for which (7) = $(7?*). Then 
(4) p(s, «, 0) — g(s, DA; = hy(p(s, €-) — g(s, 7)), 


since each side is Pr{(y,--- , Yn) = 8, ng = 1*, Ynys = O}. Solving (4) for 
q(s, 7) and substituting in (3) expresses p(s, «, 6, 0) as a linear combination of 
p(s, «, 0), p(s, €) whose coefficients are functions of M, e, 6. These coefficients are 
the quantities denoted by a(e, 5), b(e, 6) in (2). 

Coro.iary 1. The distribulion of y , y2, --- is determined by p(0), p(00) and 
the functions a(e, 5), b(e, 5). 

Proor. We have p(1) = 1 — p(O) and, since the {y,} process is stationary, 
p(10) = p(01) = p(O) — p(00), so that p(ll) = 1 — 2p(0) + p(00). Thus 
p(s) is determined if the length of s does not exceed 2. (2) determines p(s, 0) 
in terms of p for shorter sequences and a(e, 5), b(e, 6), and p(s, 1) = p(s) — p(s, 
0), so that, by induction, p is determined for all s. 

CoroLiary 2. On the set X of matrices for which p(00) # p(0)p(000) and 
p (01) = p(1)p(010), the eight functions p(0), p(00), p(0, «, 0), p(O, «, 4, 0), 
where « = 0 or 1, 6 = O or 1 are a complete set of invariants. 

Proor. Letting s be empty and the sequence 0 in (2) yields 


ple, 6,0) = ple, O)al(e, 5) + ple)ble, 5) 
p(O, «, 6,0) = pO, «, O)ale, 5) + pO, €)b(e, 5). 
Thus if p(e, 0)p(0, «) + ple)p(0, «, 0) for « = 0 or 1, the functions a(e, 6), b(e, 4) 
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are determined by p(s) for s of length not exceeding 4, so that the latter set is a 
complete set of invariants on X by Corollary 1. Since {y,} is stationary, p(s) 
for all s of length not exceeding four is determined by the eight probabilities de- 
scribed in the corollary, so that this set is complete on X. 

Thus, since there are twelve parameters in a 4 X 4 Markov matrix and an 
equivalence class is defined by eight restrictions, there is in general a four- 
parameter set of matrices equivalent to a given matrix. An explicit parametric 
representation of the equivalence classes has not been found. 

For the case of doubly stochastic matrices, in which there are nine parameters 
it turns out that an equivalence class is determined by seven restrictions, so that, 
in general, there is a two-parameter set of doubly stochastic matrices equivalent 
to a given doubly stochastic matrix. Moreover an explicit representation can 
be given, as follows: For any 4 X 4 doubly stochastic matrix for which h; ¥ he, 
hs * hg, there is a unique set of numbers a, a, A, b, B, d, D, x, y for which the 
matrix has the form (U;, U., U;, Us), where the column vectors are given by 
+a+2+ (d/z) jo -~a+z2 — (d/z) 
—a—2x+ (d/z) +a-—z2x — (d/z) 

—o —y — (d/z) + b(y/x) |? ~ —¢ — y+ (d/z) — b(y/z) 
—ao+y— (d/zx) — b(y/z) 4—o+y— (d/x) — b(y/z) 


—o —x — (D/y) + Bir/y) —o—x+ (D/y) — Br/y) 

—~o+a2— (D/y) — Bx/y) —o+2+ (D/y) + Blxr/y) 

+A+ty+t (D/y) | ~A+y— (D/y) 7 
—-A—yt (/y) +A —y— (D/y) 

It is a tedious but straightforward matter to check that p(0) (= 4), p(00) (= @) 
and the functions a(e, 5), b(e, 5) determine and are determined by oa, a, A, b, B, 
d, D, and that the restrictions p(e, 0)p(0, «) + p(e)p(O&) assert d # 0, D ¥ 0. 
Thus any choice of x, y for which all elements remain nonnegative produces a 
doubly stochastic matrix equivalent to the original, and every such matrix may 
be obtained for some z, y. 


U; = 


wi wm Q Q 


U, = 


Q QQ Ni wR 


1 
2 
1 
o 
o 


4. A large complete set of invariants. For any N xX N irreducible aperiodic 
M and any ¢, let R be the range of ¢ and let S be the set of all finite sequences 
s=(n,---, tm), k = 0, 1, 2,---,r;eR. For each s the function p,(M) = 
Pri(y:,--- , Ye) = 8}, as a function of M, is invariant, that is, p,(M1) = p.(M2) 
if M, and Mz are equivalent. 

THEOREM 2. There exists a positive integer J, depending only on N and ¢, such 
that the set of functions p, for s not exceeding J in length is a complete set of in- 
variants, that is, the joint distribution of y: , ye, --- is determined by the joint dis- 
tribution of yi, °°. Ys. 

Proor. For any s = (m,--- , 7%), k 2 2, we have 


<9 


p.(M) = 3 Ai, Miyig «+ ™ 


k—1th 
@ (ty )mry.:+* Olegd—rE 


= AM(r, r2)M (re, re) --- M(ry_s, rx)6, 
A= (i, °°*, Aw), 
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where A; = Pr{a,;'= 7}, M(r, r’) is the matrix obtained from M by replacing 
mi; by 0 unless ¢(i) = r, ¢(j) = 1’, and is the N X 1 column vector with each 
element unity. Write M(s) = M(r,, r2)M(re, rs), --- , M(ri-1, m). Let F bea 
second N X N irreducible aperiodic Markov matrix. Then p,(M) = p,(F) if 
and only if A\M(s)6 = uF(s)é, where yu is the stationary distribution for F. We 
must find a J such that \M(s)é = uF(s)é for all s of length not exceeding J 
implies equality for all s. Let A(s) be the 2N X 2N matrix with M(s) in the upper 
left, F(s) in the lower right, and zeros elsewhere, so that \M(s) = uF (s)é may 
be written aA(s)d = 0, where a = (A, —u) and d is the 2N X 1 column vector 
whose elements are unity. If we consider the class of 2N XK 2N matrices as a 
linear space of 4N* dimensions, the set of matrices A(s) spans a subspace L of 
dimension J — 1 S 4N’. It remains to show only that the set of matrices A(s) 
for the length of s not exceeding J already spans L, for if so then any A(s) is a 
linear combination of these, and aA(s)d = 0 whenever the length of s is S J 
implies aA(s)d = 0 for all s. Let ZL, denote the linear space spanned by the 
matrices A(s) for which the length of s does not exceed k. If Ixy; = Ly then 
Luxe = Ins, for say s = (r, 8’) where re R and s’ has length k + 1. Then 
A(s) = A(r, r’)A(s’), where r’ is the initial element of s’. Now by hypothesis 
A(s’) = Dover a(t)A(t), where T is the set of sequences of length < k, so that 
A(s) = er a(t)A(r, r’)A(t). Unless r’ is the initial element of t, A(r, r’)A(t) = 
0, so that A(s) = > a(t)A(r, t) where the sum is over those ¢ e T whose initial 
element is r’. Thus A(s) ely4:. We have Lp, CL; C --- CL, C---, with 
equality for sufficiently large n. If equality first occurs at k, that is, Ly = Liss, 
we have L, = L. The dimension of L is at least k — 1, so that J —12=k 

and L; = L, completing the proof. 

The J obtained in the theorem, namely J = 4N* + 1 is extremely crude. 
It can be improved somewhat by a more careful bound on the dimension of L. 
For instance, since all A(s) have zeros in the lower right and upper left places, 
the actual dimension of L is at most 2N’, so that J = 2N* + 1 will suffice. 
However, if ¢ is the identity function, L may actually have dimension 2N’, 
while J = 2 will suffice, so that, if we are to find the smallest J, a different ap- 
proach is required. 
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TESTS OF FIT IN THE PRESENCE OF NUISANCE LOCATION 
AND SCALE PARAMETERS' 


By Lionet Weiss 
Cornell University 
1. Summary. Certain functions of the sample spacings are shown to con- 


verge stochastically as the sample size increases. This leads to certain con- 
venient tests of fit which are consistent against wide classes of alternatives. 


2. The stochastic convergence of certain functions of sample spacings. 
Suppose X,, X,,--- , X, are nm independent and identically distributed chance 
variables, each with density function f(x). Let Y; S Y2 S --- S Y, denote the 
ordered values of X,, X2,--- , X,, and let U; denote Y,.; — Y; (@ = 1,---, 
n — 1). Let g(v) be a bounded nonnegative function of v defined for 0 S v S 1, 
and r be a number greater than or equal to unity. Define the chance variable 
U(r) as n’ "> oon g(i/n)U!. Then we have 

TuHeoreM 2.1. Jf f(x) = 1 for 0 S x J 1, and f(x) = O elsewhere, then 


, : ' —Ip ‘ ; 4 . 
U(r) — Tir + 1)" [nt /Trt+rt 1) do"; gli n) 
converges stochastically to zero as n increases. 


Proor. It is shown in [1] that for any positive number s, 
E{Ui} = Intl(s + 1)/Tm+s41 


for any i, and E{Uj;U}} = Intl*(s + 1) /Tin+ 28+ 1)] for any i # j. 
From this, we find immediately that 


E{U(r)} = Tir+ Dn’ ‘In! Tin+rt+ yy g(i/n), 


and, remembering that g(v) is bounded, we find that the variance of U(r) ap- 
proaches zero as n increases. This completes the proof. 

Coro.iary. Jf f(x) = 1 forO S x S 1, and f(x) = 0 elsewhere, and J} g(v) dv 
exists (in the Riemann sense), then U(r) converges stochastically to 


T(r + 1)f6 g(v) dv 
as n increases 

Proor. If fi g(v) dv exists, n™'[n!/Tin+r4+1 > can g(t/n) approaches 
it as n increases. 

Let us denote {*.. f(y) dy by F(x). Suppose that on the interval [A, B}, f(x) 
is continuous and f(x) 2 D > 0 for each z in [A, B]. Suppose h(v) is a nonnega- 
tive bounded function of v(0 S v S 1), and fi h(v) dv exists (in the Riemann 
sense). Define the chance variable R as Fis <jcnrisy h(j/n)(Y jar — Y 
and S as nD ep ajc jenris) A(j/n)[Y jar — Y,/. Then we have 
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‘THEOREM 2.2. As n increases, R converges stochastically to 4 ») thiv) fF 
dv, and S converges stochastically to 2f7(4){h(v) / {f{F'(v)]}*| de. 


Proor. If Y;,, and Y; are both in the interval [A, B|, we may write 
FCY 541) — F(Y;) = f(6)(Yia — Yj, 


where Y; S 0; S&S Ys. Let F* (zx) denote the empirical distribution function 
based on X,, X:,--- , X,. That is, for each xz, F¢(z) equals the proportion of 
the values (X,, X.,---, X,) which are no greater than x. Define A; as 


F(Y,) — (i/n)., 


which is the same as F(Y,;) — F*(Y,).. It is well known that if 4 is any positive 
number, then max,n’*~A,; converges stochastically to zero as n increases. De- 
fine 6; as |Y; — F“(i/n)|. If Y; and F“(i/n) are both in the interval [A, B}, 
then, since f(z) 2 D on that interval, we have 6; = (\;/D). Then, if Y;, Yiu, 
F ‘Gj n), and F “((j + 1)/n] are all in [A, B), we have |6; — F 9 n) =< 
1/nD + 6; + 6;.,, and we can write 

(2.2.1) F(Y;.) — F(Y) = s| (4)] (Ysa — ¥) +yAVYia —Y 2D, 
where y; = f(0;) — fiF Gj n)|. But because of the uniform continuity of f(x) 
in [A, B), the inequality for |@; — F ‘(j/n)|, and the Glivenko-Cantelli theorem, 
it is easily seen that maXx,r.4)<j<nris)'¥;, Converges stochastically to zero as n 
increases. We denote F(Y;.,:) — F(Y,;) by U; , and note that {U;} has the same 
distribution as in Theorem 2.1. From (2.2.1) we have 


-4 U; 
o> eft lesen Se 
nF(A)cjcnP(s n |r (:)| 
n 


~ 2, 


nP(A)<jcnP(B) 


The expression on the left of (2.2.2) converges stochastically to 
F(B) yl 
Sra) {h(v) / flF (v)]} de, 


by the corollary to Theorem 2.1. Let us denote the second term on the right of 
(2.2.2) by R’. As n increases, the probability that R’ will be no greater than 
(max ,y;|/D)R approaches one. This means that |R’|/R converges stochastically 
to zero as n increases. But R + R’ converges stochastically to 


F(B) , , woenl . 
Sria) {h(v) ‘TF (v)j} dv 
as n increases. This proves Theorem 2.2 as far as R is concerned. The proof for 
S is entirely similar. 


3. Application to tests of fit. We need the following lemma. 
Lemma 3.1. If F(x) and G(x) are two distribution functions, and u, (0 Su < 
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v S 1) are two given numbers, suppose F'(u), F(v), @"(u), G@"(v) are all 
uniquely determined. Also suppose that F(x) has a derivative f(x) between F~'(u) 
and F~'(v), and G(x) has a derivative g(x) between G*(u) and G™(v). Then a neces- 
sary and sufficient condition that f{F(r)| = kg[G(r)] for almost all r in [u, v] 
(where k is a positive constant) is that there are two constants C, D(C > 0), such 
that F(Cx + D) = G(x) for all x in the interval (G(u), G(v)}. 

PROOF. 

(a) Sufficiency. Suppose there are constants C, D such that F(Cz + D) = 
G(x) for all x in [@*(u), G“(v)]. For any such 2, Cf(Cz + D) = g(x). There is an 
rin [u, v] such that z = G"(r). Then Cr + D = F(r), so that Cf[F(r)] = 
glG@(r)]. For any r in [u, v] we can find a value x so r = G(x), and this com- 
pletes the proof of sufficiency. 

(b) Necessity. Suppose f[F~’(r)] = kg[G(r)| for almost all r in {u, v]. Since 
F{|F(r)| = r, we find by differentiation that (d/dr)F(r) = 1/{f(F(r)}} 
wherever f[F ‘(r)] is positive. Therefore, at each r in [u, v] at which f[F “(r)] > 0, 
(d/dr)G"*(r) = k(d/dr)F(r). This implies that for all r in [u, v], G(r) = 
kF(r) + b, b a constant, or F'(r) = KG(r) + B, B, K constants with K 
positive. There is a value x in [@~*(u), G"(v)] with r = G(x). Then F “[G(z)] = 
Kx + B, or G(x) = F(Kzx + B) for all x in [G"(u), G"(v)]. This completes 
the proof of Lemma 3.1. 

As an application, suppose we are to test the hypothesis to be described. X; , 
X:,---, X, are known to be independent and identically distributed chance 
variables. Two known constants u, v(0 S u < v S 1) are given. The hypothesis 
is that the common distribution function F(x) of X; is, for each xz in the interval 


[F(u), F*(v)], equal to G(Cz + D), where C, D are some unspecified con- 
stants (C > 0), and the distribution function G(z) is specified. We assume that 
for each z in the interval [G‘(u), G(v)], G(x) has a derivative g(x), with g(x) = 
A > 0, and g(z) has at most a finite number of discontinuities in [@*(u), G“*(v)}. 

We propose to test the hypothesis just described by means of the following 
statistic: 


no, g 


r, i nucjcne 
L,=- —33 


(Sle Clem n} 
(nucjcne nm ) 


From Theorem 2.2, we know that if the true common distribution F(z) has a 
derivative f(z) on the interval [F(u), F*(v)] with at most a finite number of 
discontinuities in that interval, and if f(z) 2 A’ > 0 on the interval, then 


Z,, converges stochastically to 
v (glG(zx)] 2 
9 gi@ 
2 Seay & 


Leet | 





(3.1) 
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as n increases. From Lemma 3.1, we know that f[F‘(x)] = kg[G“‘(x)| almost 
everywhere on [u, v] if and only if the hypothesis is true. Therefore if the hy- 
pothesis is true, Z, converges stochastically to 2/(v — wu). If the hypothesis 
is not true, but F(x) satisfies the conditions that guarantee that Z, converges 
to (3.1), we see that Z, converges stochastically to 


2 [ h(x) dx 
(3.2) Sees 


[fae ae 


as n increases, where h(x) is a certain function not equal to a constant almost 
everywhere on [u, v]. But then (3.2) has a value greater than 2/(v — u). There- 
fore the test of the hypothesis which rejects when Z, is “too large” is consistent 
against any alternative F(x) with a density function bounded away from zero 
and with a finite number of discontinuities on the interval [F'(u), F~’(v)). 
If F(x) assigns zero probability to a subinterval of [F'(u), F’(v)] of positive 
length, while F-‘(v) — F*(u) is finite, it is easily verified that Z, approaches 
infinity with probability one as n increases. Thus the test based on Z,, is con- 
sistent against a very wide class of alternatives. Another advantage of the test 
is that the distribution of Z, does not depend upon the unknown parameters 
when the hypothesis is true. Furthermore, the computation of Z, is fairly easy 
if a table of values of G(x) and g(z) is available. 


4. A conjecture about large-sample distributions. A reasonable conjecture 
seems to be that the numerator and the square root of the denominator of the 
chance variable Z, have a limiting distribution which is bivariate normal. The 
remainder of this section will be a heuristic justification of this conjecture. We 
denote n> onucicns GIG "(j/n)\(Yiar — Y;)* by Q, and Yonucicer G“(j/n)] 
(Yj;4. — Y;) by W. Z, was defined as Q/W?. From the proof of Theorem 2.2, 
W and Q have about the same joint distribution as 


ze Olio] 


U; 


eoleoy, 


\ n 


where {U;} have the same joint distribution as in Theorem 2.1. Thus W is (ap- 
proximately) a linear combination of Un.u , «-- , Un» , and Q is (approximately) a 
linear combination of U%,, --- , U2, . Next we show that the mixed moments 
of {nU;} approach the corresponding moments of {V;}, where V,, V2, --- 
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are independent chance variables, each with density e * for v > 0. In fact, 


ne Pa, + 1) --- Ta + (+ 1) 
Tin+a,+---t+at+ dL ‘ 


which approaches I'(a; + 1) --- T'(a, + 1) as m increases, while 


E{(nU;,)*' «++ (nUi,)"*} 


E{Vii --- Vit} = T(a + 1) --- (a + 21). 


Thus the chance variables W and Q are essentially linear combinations of chance 
variables which in important respects act like independent chance variables in 
the limit. The bivariate central limit theorem suggests the limiting normality 
of the joint distribution. If this conjecture is correct, then for large samples the 
approximate critical value for Z,, , as well as the power of the test against various 
alternatives, can be very easily computed. 
REFERENCES 
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ON THE DISTRIBUTION OF THE NUMBER OF EXCEEDANCES 
By K. SarKapI 


Mathematical Institute of the Hungarian Academy of Sciences, 
Budapest, Hungary 


0. Introduction. In paper [1] E. J. Gumbel and H. von Schelling deal with the 
distribution discussed below, called by them ‘distribution of the number of 
exceedances.” 

Suppose that we have n + N independent observations, regarded as two 
samples of sizes n and N, respectively, from a population with a continuous 
distribution function. Let us denote by — the number of those elements of the 
sample of size N which surpass (are larger than) at least n — m + 1 elements 
of the sample of size n(1 < m S n). Thus £ shows how many elements of the 
second sample exceed a given order statistic of the first sample; £ is called by the 
authors the “number of exceedances.” The distribution of — is given by the 


following formula: 
( of (*) 
m 
m x 


p. = P(g = z) = . 4 (x = 0,1,---,N) 


y Nth 
n+N) (7 ag ') 


Papers [2], [3], and [4] deal also with the above distribution. 

The aim of the present paper is to show that the distribution of the number of 
exceedances is a special case of the Pélya-distribution. In addition relationships 
to other distributions as to Laplace’s law of succession, etc., are mentioned. 


1. Comparison of the distributions. The formula defining the Pélya-distribution 
is as follows: 
z—1 N—z-1 
Vv I] (m + ik) I] (@@ — m+ 35R) 
f ot ae 4 i=0 j=0 
P(t = x) = ( ) a 


(2) = 


x _— 
| II (x + kR) 
k=0 
(see, e.g., [5], p. 12, and [6], p. 128.) 

If in Eq. (2) we put n + 1’ and 1’ instead of n and R, respectively, we obtain 
formula (1). 

L. Takacs called my attention to the fact that the formula of the number of 
exceedances agrees with that of Laplace’s law of succession. 

Suppose that an event with a priori rectangular probability distribution in 

Received September 25, 1956; revised May 27, 1957. 
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interval (0, 1) occurs m* times in n* trials. Then the probability that it will 
occur x times in a following set of N trials is as follows: 


1 


| rot a wae dz 


he | z"(1 — 2)” dz 
(see [7], p. 31; [8], pp. 68-69). In most textbooks the law of succession appears 
for the case m* = n*, N = 1 only (see, e.g., [6], pp. 83-85). 

Putting n* = n — 1, m* = m — 1, formula (3) goes into (1). 

The classical inverse problem of the sampling without replacement is a further 
example which leads to the same distribution. An urn contains L balls the 
number of red ones out of them being a random variable equally distributed on 
the numbers 0, 1, 2, --- , L. We take n* random drawings without replacement. 
Suppose that m* out of the n* balls turn out to be red. Then the a posteriori 
probability that the urn contained r balls is as follows ({9], pp. 109-110): 


( r )( L —_ 7 ) 
m* }\n* — m* 
i +1 
n+ 1, 


Takingr = «+m+1,n* =n-—1,m*=m-—1,L=N+n+4 1, formula 
(4) goes into (1). 


Similarly, it can be shown that some distributions treated in papers [10], [11], 
[12], [13], and [14] are of type Pélya. 

It is not difficult to find appropriate models illustrating the fact that the above 
different problems lead to identical formulae. The author wishes to give these 
models elsewhere. 


2. The moments. The moments of the number of exceedances can be derived 
from that of Pélya’s distribution (see, e.g., [15]) too. 

Kozniewska [16] determined also the mean deviation (the first absolute central 
moment) of the Pélya-distribution. Applying her results we obtain for the mean 
deviation of the number of exceedances 


a n—m +1 N - ") 
2rp, ( n+ 1 * N ; 


where r — 1 denotes the greatest integer <Nm/(n + 1). 


3. Limiting forms. The limiting forms of the Pdlya distribution are treated 
by Bricas [15] in detail. The limiting forms of the distribution of the number 
of exceedances are special cases of the distributions derived by Bricas. Particu- 
larly, the “law of rare exceedances,” the formula given in [1] for the limiting 





DISTRIBUTION OF EXCEEDANCES 1023 


case n = N — o, m remaining finite, 


aa fx+m— 
& 


provides a distribution of Pascal type. 


4. The discrete case. It is known that Laplace’s law of succession is valid in 
the case of a finite population too (see [8], p. 72; [9], pp. 110-111). The answer 
of the problem is given by formula (3) independently of the size of the popula- 
tion in case of sampling without replacement. 

Similarly the problem of the number of exceedances permits the same gen- 
eralization. 

An urn contains Z balls numbered with different real numbers. A group of 
n balls is chosen at random without replacement, and following that a second 
group of N balls is chosen without replacement too. We define the number of 
exceedances as in the continuous case (see the Introduction). Provided that 
I, =n + N, we obtain the same distribution as there, independently of L. 

REFERENCES 
{1] kX. J. GoMBEL AND H. von ScHELLING, ‘‘The distribution of the number of exceedances,’ 
Ann. Math. Stat., Vol. 21 (1950), pp. 247-262. 
2] E. J. Gumpeg i, Statistical theory of extreme values and some practical applications, Na 
tional Bureau o! Standards, Washington, 1954. 
(3] E. J. Gumpen, ‘Elementare Ableitung der Momente fiir die Zahl der (Uberschrei- 
tungen,’’ Mitteilungsblatt fiir Math. Stat., Vol. 6 (1954), pp. 164-169. 
| B. Epstern, ‘Tables for the distribution of the number of exceedances,’’ Ann. of Math. 
Stat., Vol. 25 (1954), pp. 762-768. 
5) O. LunpBERG, On Random Processes and Their Applications to Sickness and Accident 
Statistics, Almqvist, Uppsala, 1940. 
W. Feuer, Probability Theory and Its Applications, John Wiley and Sons, New York, 
1950. 
P. S. Lapiace, Oeuvres completes, t. VIII., Gauthier-Villars, Paris, 1891. 
J. Uspensky, Introduction to Mathematical Probability, McGraw-Hill Book Co., New 
York, 1937. 
H. Jerrreys, Theory of Probability, 2nd. ed., Oxford University Press, Oxford, 1948. 
) J. G. Sxe.uam, “A probability distribution derived from the binomial distribution by 
regarding the probability of success as variable between the sets of trials,’’ 
Jour. Roy. Stat. Soc., Series B., Vol. 10 (1948), pp. 257-261. 
J. O. Irwin, ‘‘A distribution arising in the study of infectious diseases,’’ Biometrika, 
Vol. 41 (1954), pp. 266-268. 
2] J. W. Hopkins, “‘An instance of negative hypergeometric sampling in practice,’’ Bull. 
Int. Stat. Inst., Vol. 34 (1955), pp. 298-306. 
| C. D. Kemp ann A. W. Kemp, “Generalized hypergeometric distributions,’ J. Roy. 
Stat. Soc., Series B, Vol. 18 (1956), pp. 202-212. 
14] K. Sarxapi, “On the a priori beta-distribution of fraction defective,’ Magyar 
Tudomaényos Akadémia Alkalmazott Matematikai Intézetének Kézleményei, Vol. 
IT (1956), pp. 287-293. 
(15) M. A. Bricas, Le systeme de courbes de Pearson et le schéma d’urne de Pélya, Cristou, 
Athens, 1949. 
[16] J. Koznrewska, “The first absolute central moment for Pélya’s distribution,’’ Zastoso- 
wanta Matematyki, Vol. I (1954), pp. 206-211 





1024 HERMAN BLASBALG 


TRANSFORMATION OF THE FUNDAMENTAL RELATIONSHIPS IN 
SEQUENTIAL ANALYSIS! 


By H. BiasBaie? 
Electronic Communications, Inc., Baltimore, Maryland 
0. Summary. For the class of distribution functions given by 
adP(zx, 6) = exp [r(@)A(x) + s8(6)B(x)] dw(z), 


it is shown that a set of three transformations can be introduced which completely 
define the Sequential Probability Ratio Test for testing a hypothesis Ho against 
H, . When the observer specifies the threshold parameters 6 and 6; corresponding 
to the hypotheses Hy and H, and the strength a, 8 of the test, he specifies the 
three transformations and hence the Sequential Test. However, there is an 
infinity of sets of parameter points (6 , 6, a, 8) which satisfy the same trans- 
formations and hence define the same Sequential Test. The Operating Character- 
istic Function and the Average Sample Number Function are derived in terms of 
these transformations. 


1. Introduction. Every pair of distributions leads to a two-parameter family 
of Sequential Probability Ratio Tests. A one-parameter family of distributions 
leads to a two-parameter family of probability ratios, and hence, one might 
expect a four-parameter family of Sequential Probability Ratio Tests. This is 
typically the case. In this paper it will be shown that there is a class of one- 
parameter families of distributions, each of which generates only a three-param- 
eter family of Sequential Probability Ratio Tests. This includes the well-known 
exponential class, of which the best known examples are the normal family of 
unknown variance and known mean, the Bernoulli Distribution, and the Poisson 
Distribution. 

Originally, the author proved that the well-known one-parameter family of 
exponential distributions 


(1.1) P(x, 0) = v(@)w(x) exp {xf(@)} 


gives rise to this property. This has been recognized implicitly by Girshick [4]. 
Heuristic arguments supplied by L. J. Savage [3] lead to a more general one- 
parameter family of exponential distributions which also exhibit this property. 
Briefly, Savage’s arguments indicate: 

(1) if r(x) and s(z) are logarithms of probability ratios such that 


n 


> riz) SAS pe s(x) < B for all n, 
i=] i=] 


Received June 4, 1956; revised June 19, 1957. 
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then, 
As(zx), 
MA, 


where ) is a constant. 

Bringing the results of (1) to bear on the problem of interest, he concludes 
that: 

(2) if a one-parameter family of distributions P(z, 6) gives rise to a three- 
parameter family of Sequential Probability Ratio Tests then 


®) dP(x, 6) = exp [r(@)A(x) + s(6)B(z)] dw(z). 


This family of distributions obviously includes the family in Eq. (1.1). The 
author’s original proof can just as easily be applied to this more general family of 
distributions. We will therefore prove that the one-parameter family of expo- 
nential distributions given in Eq. (1.2) gives rise to only a three-parameter 
family of Sequential Probability Ratio Tests. 

A Sequential Probability Ratio Test for testing a hypothesis Hy against the 
alternative hypothesis H, (where Ho and H; are mutually exclusive) for a given 
a priori probability ¢ is defined by a set of four parameters a, 8, % and 6, . The 
hypothesis Ho is accepted if @ S @ and H, is accepted if 6 => 6, , (0: > 4). Also 
a is the probability of accepting H, when Hp is true (type I error) and 8 is the 
probability of accepting Hy when H, is true (type 'I error). These parameters 
define the Sequential Test and hence its fundamental characteristics, the 
Average Sample Number Function (ASN Function) and the Operating Char- 
acteristic Function (OC Function). 


2. To obtain the statistical decision regions. Consider the one-parameter 
family of statistical distributions given by 


(2.1) dP(x, 0) = exp [r(@)A(x) + s(0)B(x)} dw(z). 


To construct an example of such a family A(z), B(x) can be almost arbitrary 
functions and w(x) an arbitrary measure on any measure space with the condition 
that w(x) must be non-negative. A little care must be taken to insure that there 
will be a reasonably large class of pairs of numbers (r, s) such that, 


(2.2) | exp [r(@)A(xr) + s(@)B(x)] dw(x) = 1. 


The pairs (r, s) which normalize (2.2) form a smooth curve and r(@), s(@) can be 
parametric equations which define the curve with the understanding that different 
values of the parameter @ correspond to different points on the curve. Among the 
important distributions contained in this class are the Bernoulli, Poisson, and 
Gaussian. 

In Sequential Analysis the observable upon which a decision is made is given 
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by the logarithm of the probability ratio (likelihood ratio) which in this case is, 


(2.3) 2(n) = [r(6:) — r(@)] > A(x;) + [s(6.) — s(6)] > B(z,). 


The decision regions in Sequential Analysis are defined by the following: 
accept the hypothesis H, when 


(2.4) z(n) = log , = B + log " ,, 


and accept the hypothesis Hy) when 


ee 


(2.5) z(n) S log —o— + log ——, 
l—a l 


where 


a = probability of accepting H, when Hp is true (6 < 
8 = probability of accepting Hy when H, is true (@ = 4,), 
¢ = a priori probability of the hypothesis H, . 


We now substitute Eq. (2.3) into Eqs. (2.4) and (2.5) and obtain 


log Sak. log ie 


(2.6) = 8(0;) — s(6o) , a l-¢ 
2 Ate + Fay — way eB) = sey ey 


corresponding to H; and 


f. + log — 


(2.7) 8(:) — s(60) 1— 1 — 
> Ate) +; r(0:) — r(@) 2 Biz,) $ “hy — r(Go) 


log 


corresponding to Hy . Let us now define the following three transformations: 


~~ 
(2.8) lek” log 


(03) = r(8) 


f 
(2.9) - log mr * + log 5 


(6) — 57 


f 


_ 8(6:) — 8(6.) 
(2.10) r(6;) — r(@) ° 


Substituting (2.8), (2.9), and (2.10) into (2.6) and (2.7) yields 


(2.11) >, A(x, +c >, B(x, >a 
i=1 i=1 
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for the acceptance of H, and 


(2.12) > A(z) + > B(x,) Sb. 
i=] 


t=1 


It is therefore seen that the a, b, c transformations completely define the decision 
regions of the Sequential Test for the one-parameter family of distributions con- 
sidered. Furthermore, for a given a priori probability ¢, there is an infinity of 
sets of values (@ , 6; , a, 8) which yield the same decision regions. 

For a given a priori probability ¢, it is known that the Sequential Probability 
Ratio Test is optimum [1] in the sense that the ASN Function at the parameter 
point (@ , 6; , a, 8) is less or equal to the ASN Function for any other Sequential 
Test. Since these parameters specify the a, b, c transformations of the Sequential 
Probability Ratio Test, this test is also optimum for a given set a, b, c. For a 
given a priori probability ¢ and a given set a, b, c there exists an infinity of pa- 
rameter points (%, 6:, a, 8) which satisfy the transformations. We therefore 
conclude that a given Sequential Test is optimum at an infinity of parameter 
points (@ , 6; , a, 8) which satisfy the given transformations. 

One can easily express Wald’s approximations to the OC and ASN Functions 
in terms of the a, b, c transformations. The OC Function can be obtained by 
solving Eq. (2.8), (2.9), and (2.10) for the appropriate variables, substituting 
these into the parametric equations which define the OC Function [2] and then 
introducing a new transformation, 


(2.13) u = exp (Alr(@:) — r(@))). 


As h ranges over the entire real line, u ranges over the positive half of the real 
line. The parametric equations for the OC Function are then given by 


(2.14) Lu) = —— 


ue 


and 
(2.15) E(u“ ne - 
At the indeterminant point u = 1, 


(2.16) LQ) = oF 101° 


The point u = 1 corresponds to the value 6 = @ for which 
E,{A(x)] + cE,[B(x)] = 0. 


In a similar manner the ASN Function [2] can be shown in terms of the new 
transformations as 


_ bL(u) + all — LW) 
(2.18) Edn) = FTA) + cE Ba) * 
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When u = 1, 6 = @ and 


b 
2.18) Rha 0 ig omni. 
( v(n) = FAG) + BOF 
The author wishes to express his thanks to L. J. Savage for his interest in this 
problem and his many useful suggestions. 
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WHEN DIFFERENT PAIRS OF HYPOTHESES HAVE THE SAME 
FAMILY OF LIKELIHOOD-RATIO TEST REGIONS' 


By Leonarp J. SAVAGE 
University of Chicago 


Blasbalg [1], in this issue of these Annals, shows that certain families of dis- 
tributions are especially simple, or degenerate, from the point of view of se- 
quential tests. The main object of this note is to show briefly that these are (at 
least practically) the only families thus degenerate; some preliminary and related 
conclusions are also demonstrated. 


Let F and G be a pair of probability measures on a space X with elements z, 
and let ¢ be the logarithm of the likelihood ratio of F with respect to G. ¢ is of 
course defined only mod (F + G), that is, only up to sets simultaneously of F 
and G measure 0. If x; is a sequence of values of x, then a likelihood-ratio critical 
region in X” is defined by 


\ 


( n 
(1) R(A,n) = ¢ (a1, +++, tn) > ez) <A 
1 


\ 4 


The innocuous ambiguity of ¢ of course induces corresponding ambiguity in R. 
This family of sets R is simplest to study when the distribution of ¢ is non- 
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atomic under both F and G and when the essential range of f is (— «, ©) under 
both F and G, less rigorously, when ¢ has no plateaus and takes on almost all 
values. It is henceforth assumed that F and G are such that ¢ has these properties. 
Consequently, if R(A, n) = R(B, n), for some n 2 1, then A = B. 

Now introduce a second pair of distributions F’ and G’ also on X, together with 
their ¢’ and their critical regions R’, and subject F’ and G’ to the same conditions 
as F and G. 

When can some R’ be the same as some R? The answers for n = 1 and 2 are 
easy but uninteresting and will be deferred. 

THEoreEM 1. Jf (for some A and A’, some n = 3, and some representation £ and 
f’ of the logarithms of the likelihood ratios) R(A, n) = R’'(A’, n), then f’ = af + B 
for some a > 0 and for some B, and A’ = aA + n§. Conversely, if tf’ = af + 8B, 
for some a > 0, then R(A, n) = R’'(aA + ng, n) forall A andn,n2@ 1. 

Proor. The second part of the theorem is obvious.’ The full proof of the 
first part will be clear from the proof for n = 3. 

If f(u) < (v), then f’(u) < f'(v). Indeed, if f(u) < f(v), there clearly are 
x and y such that 


(2) {(x) + &y) + &u) S A, f(x) + t(y) + fv) > A. 


This implies the corresponding relations for ¢’ and A’, which in turn imply that 
(‘(u) < f(v). It follows that ¢’(x) is a strictly increasing function, f[¢(x)], of 
{(x). From the density of the range of /, it follows that ¢ has a continuous and 
one-to-one extension to all the real numbers. (Note that this line of argument 
applies even if n = 2.) 

Now, working with this extension of ¢, note that 


(3) a+b+c=A_ ifandonlyif ¢t(a) + tb) + t(c) = A’, 


whence (making essential] use of the fact that n > 2) 
(4) tla+b)+ 40) + tc) = A’ ifandonlyif t(a) + tb) + tc) =A’, 


and therefore 
(5) [t(a + b) — 4(0)] = [t(a) — 4(0)] + [t(b) — 4(0)). 


This shows that the strictly increasing and continuous function ? is of the form 
af + 8 with a > 0. Finally, in view of (3), A’ = aA + 38, and the proof for 
n = 3 is complete. 

Corouuary 1. If the hypothesis of the first part of Theorem 1 holds with fixed 
A and A’ for two different values of n (of which only one need be as great as 3), then 
{ = afl’ mod (F + G+ F’ + G’), with a > 0, for all representations of ¢ and f’. 

If ¢ = al’ mod (F + G + F’ + G’) for some a > 0, call (F, G) parallel to 


(F’, G’). Parallelism is obviously an equivalence relation. 


2 No condition on the distribution of ¢ and ¢’ is needed for such ‘‘sufficiency”’ conclusion 
as this. 
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The techniques used in proving Theorem 1 lead easily to the following con- 
clusions for n = 1 and 2. R(A, 1) = R’(A’, 1) if and only if (7) < A where 
and only where ¢’(x) S A’. R(A, 2) = R’(A’, 2) if and only if ¢’(x) = ¢[€(x)] 
with ¢ continuous, strictly increasing and subject to the identity ¢(¢) + 
(A — £) = A’. Another way to describe these functions is to prescribe that 
t(f) = q(t — 3A) + 4A’, where q is strictly monotone, continuous, and anti- 
symmetric. 

A sequential likelihood-ratio test for F and G (together with a determination of 


¢) defines three critical sets S, 7’, U in each X", which can be described formally 
thus: 


(6) ln = >, t(z,), 
1 


(7) S(A, B;n) {(a,,°**,% cB S Am <a; B > Gi, 
(8) T(A, By n) = {(a1,-+-,2,):B A,m Sn}, 
(9) U(A, B;n) {(a,, +--+: ,2%,):B A.m <at> Al. 


It can happen that, for each pair A, B with B < A, there exist A’ and B’ such 
that S(A, B;n) = S’(A’, B’;n), T(A, B; n) = T’(A’, B’; n), and U(A, B;n) = 
U’(A’, B’; n) for all n. It is clearly sufficient for this that (F, @) be parallel to 
(F’, G’). Parallelism is also necessary (even if n is confined to the range 1 and 2) 
as the next two paragraphs prove. 

Studying n = 1, you see that A’ is determined by A alone and B’ by B. Con- 
tinuing with n = 1, if f(u) < 4(v) it follows that ¢’(u) < ¢’(v). Therefore f’ is a 
strictly increasing function ¢ of ¢, and, in view of the density of the ranges of 
f and ¢’, ¢ is extendible to a continuous, strictly increasing, function. Also 
A’ = 1(A), and B’ = i(B). 

Now turn to n = 2. Consider two real numbers c and d with d 2 0. Letting 
c + d = A, you see that, since c S A, t(c) + td) S A’ = t(e + a). But, in 
view of the continuity and strict monotony of ¢, equality actually obtains. A dual 
argument leads to the same conclusion if d < 0. Therefore, ¢ is linear, homoge- 
neous and increasing, so ¢ and ¢’ are indeed proportional. 

The possibility that S, 7’, U equals S’, T’, U’ only for some one quadruple of 
parameters A, B, A’, B’ and all n may be of interest, though the answer is a 
little complicated. The following conditions are almost obviously sufficient: 
(A’ — B’) = a(A — B), C = af + A’ — @A for fe [B, A], tf’ = al for 
(<«([B — A, A — B\, a > 0. These conditions are necessary (even if n is confined 
to the range 1, 2, 3) as the next paragraph proves. Note that, if [B, A] and 
[B — A, A — B| intersect, as they do in the usual configuration B < 0 < A, 
these conditions simplify to: A’ = aA, B’ = aB, and f’ = af for fe[B, A] u 
[B — A, A — Bl}. 

The following facts are easily checked successively. For ¢ ¢[B, A], f’ = r(@) 
is an invertible function connecting the values of ¢¢[B, A] and ¢’ e[B’, A’). 
r is strictly increasing and has a continuous extension, and r(A) = A’, r(B) = B’. 
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Similarly, for /¢{B — A, A — B|, = (4), with ¢ strictly increasing with a 
continuous extension. / is linear and homogeneous. Finally, r is of the form 
al + A’ — aA and af + B’ — aB. 

Now let F(s) be a one-parameter family of probability measures on X pa- 
rametrized by s and defined by probability densities f(s) with respect to a fixed 
(o-finite) measure py, and let In f(s) = h(s). Since the introduction of sequential 
likelihood-ratio tests, it has often been remarked that certain important families 
F(s) are especially simple with respect to sequential tests, as has recently been 
emphasized in [1]. Indeed, it is the special feature of these families that every 
pair (s, s’) is imbedded in a one-parameter family of pairs, say [s(@), s’(@)} such 
that each [F'(s(@)), F(s’(@))| resembles [F(s), F(s’)] and is in fact parallel to it in 
the technical sense of this paper. What does this simplicity imply about the 
structure of the family F(s)? 

One graphic way to couch the answer is to remark that, for each s, h(s) is a 
vector in a function space. The family A(s) is a curve in this space. And the techni- 
cal condition of parallelism is simply that the chord [h(s) — h(s’)] be parallel 
in the ordinary geometrical sense to the chords [h(s(@)) — h(s’(@))|. Thus A(s) 
needs to be a curve of which every chord is parallel to many others—I am 
purposely a little vague in order to admit more than one possibly equally interest- 
ing interpretation of ‘‘many others.” This condition is obviously met in a wide 
sense if h(s) is any plane curve, that is, if h(s) can be represented as 


(10) ho + m(s)hi + n2(s)he , 


where jie , 4; , Ae are fixed vectors (that is, real-valued functions of x) and m;(s), 
n(s) are real-valued functions of s. For narrower senses, regularity conditions 
might be imposed on m(s), m(s). 

Moreover, it is presumably only a plane curve A(s) that can satisfy the con- 
dition in any way that would be considered natural. By an unnatural way, | 
here mean resort to a space filling curve or the like. By a natural way, I mean 
one with enough regularity to justify something like the following proof that 
h(s) must be a plane curve. 


[h(s) — h(O)] = XA(s, As)[A(s + As) — h(¢(s, As))] 
= [h(s) — h(0)] 


+ As “ 


(11) 


2 (h(s) — h(0)] + |e) — h'(0) ae. 0) || + o(As), 


for s > 0 but sufficiently small, 


where the dot indicates the derivative with respect to s. Therefore, 


(12) £ Ih(s) ~ n(0)] = —2(s)h(s) — h(0)) + 6'(s)h'(0). 


for s > 0 but sufficiently small, 


using evident abbreviations. 
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Multiplying through by an arbitrary linear functional, you see that, for given 
’ and ¢’, (12) is in effect a collection of first order, linear, ordinary differential 
equations that can be treated separately. In particular, the whole curve h(s) 
(for the range where (12) is satisfied) is determined through (12) by the value of 
h at any one value of s, say for symbolic simplicity, at s = 1. But it is easily 
verified that (12) is solved by an h(s) of the form 


h(O) + a(s)[h(1) — A(O)] + B(s)\h (0), with a(l) = 1, Bil) = 


Inffact, a and 6 are obviously determined by the calculation 


(13) a’(s)[A(1) — h(O)] + 8°(s)h (0) 

= —X'(s){a(s)[k(1) — h(O)| + B(s)h (O)}} + 6’(s)h (0), 
(14) «'(s) = —Nd"(s)a(s), 
(15) 8°(s) = —X’(s)B(s) + $'(s). 


Thus the initial segment of h is a plane curve, and by piecing, this conclusion can 
be extended to the whole curve. 

To summarize, the logarithmic densities of a family that is, so to speak, de- 
generate with respect to sequential tests can with more than sufficient generality 
be represented by (10). The corresponding probability densities are of the form 


(16) f(x, 8) = folx) f(xy" fo(x)”™™. 


Such families exist in great abundance. The choice of fy , fi , and fe is nearly arbi- 
trary, subject only to fo 2 0, fi, fe > O (at least where fo > 0) and mild in- 
tegrability conditions. Typically, m and 7-2 will, for given fo , fi , fe , be constrained 
to lie on a convex curve in order that f shall be normalized, that is, integrate to 1. 
This curve can be parametrized arbitrarily, to complete the construction. See 
[1] for important examples. 

The form (16) has a natural extension to families with two or more parameters 
For example, the two-parameter, bivariate, normal family corresponding to two 
random variables with means 0, equal variances o, and correlation coefficient 
p is of the form 


fo(x)fu(x)" fala) faa)?” 


REFERENCE 


{1] H. Buaspane, “Transformation of the fundamental relationships in sequential analy 
sis,’ this issue of these Annals. 
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ON THE DETECTION OF DEFECTIVE MEMBERS OF LARGE 
POPULATIONS' 


By ANDREW STERRETT 


Denison University 


1. Introduction. The Annals of Mathematical Statistics contained a note by 
Robert Dorfman [2] explaining an efficient method for eliminating all defective 
members of certain types of large populations. In particular, the application con- 
sidered was the weeding out of all syphilitic men called up for induction into the 
armed forces. Instead of testing each blood sample individually, Dorfman pro- 
posed to pool k samples for a single analysis. The presence of syphilitic antigen 
in the pool led Dorfman to make k individual tests; the absence of the syphilitic 
antigen allowed him to clear k men with one test. One purpose of the note was 
to find the optimum k and the efficiency of the method for various prevalence 
rates of defectives. The purpose of this paper is to increase the efficiency of 
detection. 

Rather than analyze each sample of a defective pool, it is proposed to make 
individual tests only until a defective is found. For small prevalence rates of 
defective members it is likely that a new pool formed from the untested samples 
will prove to be negative. If so, the work is finished for that pool; if not, one 
should test individuals again—but only until a defective is found. Continuing 
this procedure until a negative pool is found will increase Dorfman’s efficiencies 
by about 6 per cent (from a savings over individual inspection of 80 per cent toa 
savings of 86 per cent for a prevalence rate of defectives equal to 0.01). 


2. Notation. The probability that a pool containing k samples has exactly 7 
defective members is given by Pr;(7); the expected value of the number of analy- 
ses required to isolate the 7 defectives by the proposed method is E;,(7). 

Given a universe of N elements with p per cent defective, E(N, k, p) is the 
total expected value of the number of analyses required to investigate the uni- 
verse by pooling k samples at a time. 


3. Procedure. Using the definition of expectation of a random variable, 


(1) E(N, k, p) = 7 (Pr, (i) E,(i)}. 
=0 


Before E(N, k, p) can be evaluated it must be shown that 


i ; 1 ae 
2 ot) = —— k 1+——-— -2i--. 
(2) ae = raren ss ot + ee - hy 

When there are no defective elements in a pool, one laboratory analysis will 
suffice. That is, E,(0) = 1 as Eq. (2) verifies. 


Received January 8, 1957. 
1 The material in this article is derived from a dissertation submitted to the University 
of Pittsburgh, June, 1956. 
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man method 
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Labana/yses per hundred blood tests 


See of group 


Comparison of economies resulting from testing by two group methods 


)=1 +7 U1 + Exa(n — 1)} 


en tt . \ 
+2 | (Ir! Fo) RG + D + Betsenln - pi]. 
=1L\ial k— (@ — 1) k-j j 
The first term on the right-hand side of Eq. (3) represents the initial group 
test. The factor n/k in the next term is the probability that the first sample 
tested is defective; the factor {1 + E,1(n — 1)} is the sum of the number of 
tests required to find a defective on the first trial and the average number of 
tests needed to find (n — 1) defectives in the remaining pool of k — 1 members. 
The probability that the first 7 samples are not defective is 


I] [kK — (i+n— 1)\/[k — @ — 1), 


t=] 


while the probability that the (j + 1)st element tested is defective is n/(k — j). 
The number of tests required to find the first defective is (j7 + 1,) and 
E,-1541(n — 1) is the expected number of tests required to find the remaining 
n — 1 defectives among the k — [7 + 1] members. 





DEFECTIVE MEMBERS 1035 


Equation (3) reduces to the form given by Eq. (2) when values of 
Ex-1544(n — 1) obtained from Eq. (2) are properly substituted. The proof, 
then, of the formula for £;(z) follows by induction. 


4. An approximation to E(N, k, p). The probabilities connected with all but 
the first few terms of E(N, k, p) are insignificant for small p. Therefore an ap- 
proximation to E(N, k, p) is defined as 


E'(N, k, p) = (7) > {Pri(i)E,(i)}, 


1=0 


where m is the smallest integer such that }°7, Pr.(i) > 0.99. 


TABLE | 
Comparison of efficiencies by grouping under the Dorfman plan and the new method 


Dorfman Plan New Method 


Lab analyses 


Lab analyses | (» + 1)* ork Optimum & per hundred Difference 


Optimum & 


per hundred E’(k, p) 


32 ) : 47 4 
0. 19 ; 30 s 
0. 15 ‘ 22 
0. 12 : 20 12 
0. 1] 7 2 16 14 
0. 3 11 
0. ) 
0. 
0. 
0. 
0. 
0.08 
0. 
Q. 
0. 
0.1: 
0.1: 
0. 
O.1 
0.4% 
0.2 
0.25 
0.3 
0. 
0. ° 99 
0.3% 
0.35 3 106 
0.38 


27 
32 
35 
39 
42 


w 


Sswwww 


” 
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48 


. 


51 


54 
57 


or 


61 
65 
74 


emRwWwWwww w 
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84 

86 

87 

90 

93 

96 10 
100 


Www wh hh mh 


tw ty w WH 


~ 
Nw te 


nwt 


to 
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(m + 1) is the number of subdivisions into which each member of the pool should be 
subdivided in order to be 99 per cent sure of knowing the history of the pool before exhaust- 


ing any member. 


* 
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The number of terms required to calculate E’(N, k, p) is m + 1. This is also 
the minimum number of subdivisions into which an element must be divided 
by a laboratory technician to be at least 99 per cent confident that he will know 
the history of the group before exhausting any element. Values of m + 1 cor- 
responding to many p’s will be found in Table I. 


5. An error expression. Define 6 to be E(N, k, p) — E’(N,k, p). In other words, 
6 = (N/k) Diemar {Pri (i)Ex(i)}. 

Since E,(k) = 2k — 1, it follows that 6 S (2k — 1) (N/k) Doiems: Pri(i). 

Arbitrarily, m is chosen large enough to make )~%~o Pr;(i) greater than 0.99. 
Therefore, Stig Pr;,(z) is less than 0.01. Consequently, 


5 < (2k — 1)(N/k)(0.01) = [2 — (1/k)]/100-N. 


That is, 6 is less than 2 — (1/k) for each 100 items of the universe. This is a 
generous error since it was assumed that every pool containing more than m 
defectives contains k defective elements. 


6. Conclusions. Using E’(N, k, p), the optimum k and their corresponding 
economies are determined for many prevalence rates in the range 0.001 < 
p =< 0.38. Values of E’(N, k, p) are calculated fork = 4, 8, 12, --- and at the 
intermediate integral values necessary to insure that the minimum value is 
found. Results of this work and comparison with Dorfman’s efficiencies are 
found in Table I. 


REFERENCES 
|1] Srerretr, ANDREW, ‘‘An efficient method for the detection of defective members of 
large populations,’’ Ph.D. dissertation, University of Pittsburgh, 1956 
|2} Dorrman, Rosert, ‘‘The detection of defective members of large populations,’’ Ann. 
Math. Stat., Vol. 14 (1943), pp. 436-440. 
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MAXIMUM LIKELIHOOD ESTIMATES IN A SIMPLE QUEUE 


By A. Bruce CLARKE! 
University of Michigan 


0. Summary. The problem of obtaining maximum likelihood estimates for 
the parameters involved in a stationary single-channel, Markovian queuing 
process is considered. A method of taking observations is presented which simpli- 
fies this problem to that of determining a root of a certain quadratic equation. 
A useful and even simpler rational approximation is also studied. 


1. Introduction. By a simple queue is meant a queue having a Poisson input and 
a negative exponential service time (type M/M/1 in the notation of Kendall 


Received November 19, 1956. 
1 Research under contract with the Office of Naval Research carried out at Cornell 
University. 
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[1]). That is, the arrival of individuals at the tail of the queue is assumed to be a 
Poisson process with parameter A, (A = the mean number of arrivals per unit 
time), while the time required for an individual at the head of the queue to pass 
through the service mechanism is assumed to have a negative-exponential 
distribution with frequency function of the form ye’, (1/u = the mean length 
of an individual service time), the individual service times being independent of 
each other and of the arrival times. An equivalent description of the service 
mechanism is that individual departures form a Poisson process with parameter 
u, independent of the arrivals, provided the queue is nonempty; when the queue 
is empty, no departures can occur. 
The quantity 


/u 


is known as the traffic intensity of the system. It is well known, [2], that, if p < 1, 
then the distribution of the number, n(¢), of individuals in the queue at time / 
approaches a limiting distribution as t — «, independent of the initial queue 
length. This limiting distribution is geometric with common ratio p. If p = 1, 
then no such limiting distribution exists and the mean queue length becomes 
infinite. This paper is concerned with the statistical estimation of p, as well as 
the individual parameters \ and yu. 

The most obvious method of estimating A and » would be to observe the 
operation of the queue for a fixed time s, note the number of arrivals n, the 
number of departures m, and the busy time r (that is, the total time during which 
n(t) > 0). Then \ would be estimated by n/s, and uw by m/r. Inthe non-stationary 
case, p 2 1, this may be the best one can do. However in the stationary case, 
p < 1, the initial value of n(¢) is available, which, under the assumption that the 
process has attained its stationary state, constitutes extra information from 
which one should be able to make more accurate estimates. However, in order 
to obtain maximum likelihood estimates, it is necessary to study the distribution 
of the random variables involved under the condition that the total observation 
time is fixed, and this turns out to be extremely complicated. In the following it 
is noted that if the process is observed for a constant busy time r, rather than 
total time, a considerable simplification is achieved and the problem then admits 
of an elementary solution. 


2. Sampling method. One observes that the time axis may be decomposed 
into two random sequences of intervals: the busy intervals consisting of all times 
when the queue size n(¢) is greater than zero, and the free intervals consisting of 
all times when n(t) = 0. The assignment of the endpoints of these intervals is 
arbitrary and immaterial. By the busy time between t, and t, is meant the sum 
of the lengths of all the busy intervals, or parts of intervals, between time f, 
and time /, > #,. During any busy interval arrivals and departures proceed as 
independent Poisson processes with parameters \ and uz. 

Let the process now be observed until the busy time reaches some preassigned 
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fixed value 7, and the values of the following random variables noted: 
v n(0), the initial queue size; 
m = the total number of departures during this period; 
T = the time of the mth departure; 
n = the total number of arrivals up to time 7’. 


3. Construction of a likelihood function. It is assumed that the system is 
proceeding in equilibrium, i.e., p < 1 and » is taken to have a geometric dis- 
tribution with common ratio p. 

Let us further define the random variables x; = the ith arrival time, y; = the 
ith departure time, and z; = busy time up to the 7th departure, 7 = 1, 2,---, 
(4; = ys = 2 = Ofori S 0). 

Note that 


Ys = max [y:-1, Xi] + 2% — Wei. 


Thus y1,-°-° , y are determined recursively by ~,--- , % and %1,---, %-, 
and consequently the entire queuing process may be described by specifying », 
the sequence x, t2,-°--, and the sequence 2, 2,---. The sequences 2;, 
Y,--: and z;, z,--- represent the transition times of independent Poisson 
processes having parameters \ and yu, both processes being independent of v. 
Since 2; , z2, --- refers only to busy time, when arrivals and departures proceed 
independently, the x; , z2,--- process will be independent of the 2, 2, --- 
process. 

The likelihood function Z may now be constructed stepwise as follows: 

(a) v has a geometric distribution with frequency function 


(0), song. 
MB BK 


(b) m is a function of the z; only—namely, the maximum index for which 
Zm <= t. Thus m is independent of v and has a Poisson distribution with 
frequency function 


as (ur)™ 
m! 


m=0,1,2,---. 


(c) The conditional distribution of 2; , --- , 2m, given m, is that of a random 
subdivision of a fixed interval of length 7 into a fixed number, m + 1, 
of parts, and is thus independent of yu, (and, of course, of \). 

(d) When v and m are given, 21, --* , Xm» Will be independent of 2; , 
and will have joint frequency function 


7 rs » m—v 
tale ates OSX -¢* S& tes < @, 


(e) When v, m, a, -°-- , 2m, and %1,--+, tm are given, 7’ = y,, is deter- 
mined, and the number of arrivals from time 7,,_, to time 7’, namely 
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n — m+ y, will have a Poisson distribution with frequency function 


—h(T—zm—») [ACT “ee Pe Pata 


(n — m+»)! ’ n—m+v=0,1,2,---,n+yv>0. 
Multiplying all these conditional frequency functions together, one obtains for 
the likelihood function 


L = (1 i a) err f ir : pore i K, 
B 


where K depends on 7, m, n, %,°-: , Zm, and Z»_,, but not on A or yu. This 
formula was derived under the assumption that m — v > 0, but it is easily seen 
that the same formula holds when m — » S 0. 


4. Maximum likelihood estimates and approximations. Standard methods 
of obtaining maximum likelihood estimates can now be applied. It has been 
assumed that the process is stationary, p = (A/u) < 1. Consequently \ and u 
must be confined to the region 0 < A < uy. It is easily seen that at least one 
interior maximum exists. On differentiating L with respect to \ and yw and setting 
the derivatives equal to zero, after some simplification, the following equations 
for the maximum likelihood estimates \ and { are obtained: 


h= (@ — (n+ v — XT), 
h = (\ — f)(m — » — AT). 

Substituting } = 4, and eliminating f, one obtains the following quadratic 
equation for the maximum likelihood estimate 4 of the traffic intensity p: 


f(s) = (m — v — 1)T# — [(m — v)T + (n+ v+ 1)r]é6+ (n+ »)r = 0. 


Since f(0) = (n + »)r > 0, f(1) = —r — T < 0, exactly one solution of this 
equation lies in the interval 0 < 6 < 1. This unique solution will be the required 
estimate. 

In order to obtain a simple rational approximation to #, one notes that if the 
terms m — v — landn + v-+ 1 are replaced by m — v and n + » respectively 
in the above quadratic equation, the resulting equation will have unity as one 
root and the other root will be 

_ (n+ »)r 
a” (n= v)T" 
Presumably, under certain conditions, p; will be an approximation to #. More 


precisely, by a straightforward computation one can show that, provided 
0 < p; < 1, then 


p< pi 
and 


0 21 
< —-p< ; 
- . (1 — pi)(m — v) 
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Consequently, p; will be a good approximation to 6 whenever p; is bounded 
away from unity and m — + is large. 

On substituting back, one obtains for the maximum likelihood estimates of 
\ and pu 


_ (n+ mp 
pT +r 
n +m 
aT +r 
Whenever the approximation of p; for is valid, the following simple approxi- 
mations for \ and 4 result: 


p= 


be 


Note the difference between these formulas and the formulas n/7' and m/r 
which would result if the initial distribution was neglected as mentioned in 
Sec. 1 
\ . . 


REFERENCES 
[1] D. G. Kenpa.tu, ‘‘Stochastic processes occurring in the theory of queues,’’ Ann. Math. 
Stat., Vol. 24 (1953), p. 338. 
{2} D. G. Kenpatu, “Some problems in the theory of queues,”’ J. Roy. Stat. Soc., Ser. B, 
Vol. 13 (1951), p. 151. 
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MOST POWERFUL RANK-TYPE TESTS 
By D. A. S. FRASER 


University of Toronto 


For some non-parametric problems the use of the invariance principle reduces 
the class of suitable tests to those based on ranks of ordered observations. To 
obtain among these the test that is most powerful from some specified alternative 
distribution, it is necessary to have the marginal probability distribution of the 
rank statistic under the alternative. Hoeffding {1] gives a method that expresses 
the probabilities of such a distribution in terms of an expectation taken with 
respect to the hypothesis distribution. Applications have been made to the 
problem of location (Hoeffding [1]) and to the problem of randomness (Lehmann 
[2] and Terry [3]). We extend Hoeffding’s method and, for the problem of loca- 
tion with symmetry, derive a locally most powerful rank-type test against normal 
alternatives. 


Received November 13, 1956. 
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Suppose a point in a sample space X can be given by coordinates r, ¢ in such 
a way that X, X X, = &, where X,, X, are the sample spaces for r, ¢ respec- 
tively. Also, suppose that there are two probability measures on the space X, a 
hypothesis measure 


M, Pri{(r,)eR} = [ sa autr) dr(t), 
R 


where f(r), g(t) are, in fact, the marginal densities of r, f with respect to the 
o-finite measures yu, v respectively, and an alternative measure 


M, Pri{(r, eR} = / p(r, t) du(r) dv(t). 
RK 


We assume that Mz; is absolutely continuous with respect to M,, that is, 
p(r, t)/f(r)g(t) is finite almost everywhere p X v. 

In the application we have in mind, r will be like a “‘rank statistic” and ¢ will 
be like an “order statistic”. For the problem of randomness, r would be the ranks 
(r,,--*, Tn) of the n observations (2; , --- , z,), and ¢ would be the order sta- 
tistics (rq) , --* , my) Where 2q) «+ Zn) are the numbers 2 , --- , Z, arranged in 
order of magnitude. It is of interest to remark in passing that the term order 
statistic can be misleading; in calculating the “order statistic” one loses precisely 
the information on the “order” in which the different values occur in the sample. 

TuHEeoreM. The marginal density function for r under M, is 


'e ( p(r, t) | \ 
(1) f(r)Ey,< Po ir}, 
"VIMO 
which is the hypothesis density adjusted by the expectation of the density ratio taken 
with respect to the marginal distribution | ' uiuder the hypothesis. 
Proor. The proof is trivial. The marg:\: .) de: sity function for r is 


(r, t) 
p( ( ( PM, ) aK 
[. p(r, t) dv(t) = f(r) Sg g(t) dv(t) 


{ p(r, t) |) 
= f(r)E, Ir. 
IE as) Fg" | 

This theorem can be of use whenever there is a hypothesis measure for which the 
marginal distribution of ¢ is simple enough that the expectation can be evaluated 
or approximated easily. 

Examp Le. Consider the problem of location. Let x = (x, , --+ , 2,) be a sample 
of n from an absolutely continuous distribution, having density function f(z) 
on the real line. For the problem 


Hypothesis: Median {f(x)} = 0 
Alternative: Median {f(x)} > 0, 


(2) 
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the sign test is uniformly most powerful. Changing the hypothesis, we obtain the 
problem of location and symmetry 


(3) Hypothesis: f(z) symmetric about zx = 0 
3) 
Alternative: f(z) nonsymmetric about x = 0. 


This problem has a larger class of tests of a given size and the sign test does not 
remain uniformly most powerful even with a one-sided alternative. Wilcoxon 
[5] has proposed a sign rank test. 

We consider the formulation (3). Any topological transformation of the posi- 
tive axis coupled with the same transformation of the negative axis obviously 
leaves the problem unchanged. It is reasonable then to examine invariant tests 
and in particular to find the maximal invariant function. Let |z\q),--- , | Zl 
designate respectively the smallest, --- , thelargest of then values|2;|,--- ,| 2, |. 
Also, let s; , -- + , 8, be thesigns respectively of the x’s producing | x |\q) , --~ , | \¢). 
We take 


r= r(x) = (8, °°: , 8s) 
t = tx) = (\t|m,--+, | 2h). 


(r, t) does not provide coordinates on the whole sample space R"; however it 
does provide coordinates on the sample space of xq) , «++ , Zn) Which is a suffi- 
cient statistic for the problem (the region having any coordinates equal 
Z(i) = 2 has measure zero and is disregarded). r is the maximal invariant 
function. ¢ is a sufficient statistic for the measures of the hypothesis in (3). The 
sample space of r has 2” points and they have equal probability under the 
hypothesis. 

To find an invariant test that is most powerful for a specific alternative, we 
need the marginal distribution of r under the alternative, and we can obtain 
this from the theorem above. An alternative of interest is the normal distribution 
with mean » and variance o. A reasonable hypothesis distribution to use with 
this for the theorem would be the normal distribution with mean 0 and variance 
o . We evaluate Pr {s;, --- , 8,}. Obviously, this depends only on u/o. Acccord- 
ingly, we set u/o = 6 and work with normal distributions having unit variance. 


Pris: ,-++, 8n} = 


1 falternative density | t 
2" ~~ \hypothesis density | 


i 


0<2z<- 
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oe lf “x {5>_s,2,} 2" 1(2 ‘ss xX wr dx 
"a? exp | 8,2ij 2 Niiew exp “2 W aL; 


0<2z,< sorceez 
1 —nb2 
2) 7s b2ailzl iy 
= —e* Ele : 

2" 
where |z\a),-°-*, (Ziq) are the order statistics for a sample of absolute values 
from the standardized normal distribution. By applying the fundamental lemma 
of hypothesis testing, we find the most powerful test has test statistic 


¥¢ 828; )2| ¢; 
— es 


a function of s; , --- , 8, . For 6 small this can be approximated by 


(4) E{1+ 68>. siz\o} = 1+ 6). sEi{\zl@}. 


An equivalent statistic is 


(5) > sE{\z\a}. 


This is the Wilcoxon test statistic, > s¢, with ranks replaced by expected order 

statistics for a sample of absolute values from the standardized normal. The 

limiting distribution of (5) under the hypothesis can be shown to be asymptot- 

ically normal by the use of the central limit theorem and a result of Hoeffding [6]. 
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NORMAN C. SEVERO 


ASYMPTOTIC BEHAVIOR OF TESTS ON THE MEAN OF A 
LOGARITHMICO-NORMAL DISTRIBUTION WITH 
KNOWN VARIANCE! 


By Norman C. SEvVERO? 


Carnegie Institute of Technology 


1. Summary. Three tests were considered by Severo and Olds [1] for testing 
an hypothesis on a mean of a logarithmico-normal distribution with known 
variance. The purpose of this note is to discuss the asymptotic behavior of these 
tests for large sample size. 


2. Asymptotic properties of the tests for large n. We adopt the terminology 
and notation of [1] in order to discuss the asymptotic properties of the 7, , 72, 
and 7’; tests for large sample size n. The particular cases considered in [1] indi- 
cate that the power of each test increases as n increases. The question arises as 
to whether or not the approach is to some particular power function which has 
well-known properties. 

The T, test. When pz = oz then Br, = #(z.) for all n. When uw, > ou. , then 
the expression 


Bh hte Ug Be V1 + ole 
"Vien  Wistne  Vitn 
is always greater than zero. Therefore 
[®(z.) =1—a, be = oe 
lim Br, = ‘ 
ee \@(— oc) = 0, Mz > obtz 
which is simply the ideal operating characteristic of a statistical test. Thus, 
increasing the sample size does not alter the functional form of the operating 
characteristic of the 7; test. 

The T, test. The T; test involves the mean Z, of n logarithmico-normal variates 
each having the same mean yu, , and the same variance 1. By an application of 
the Lindeberg-Levy form of the Central Limit Theorem [2], Z is asymptotically 
N(uz , 1/n). Hence, for large n, it follows that the operating characteristic of the 
T.2 test at any wz > oz may be approximated by 


Br, = 


-) ‘ ¥ 
= fz. — (uz — wz) Vn} = b{z. — bn}. 
Received March 21, 1957; revised April 17, 1957. 
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Thus, for large sample sizes the T, test behaves like the most powerful one-sided 
test for testing a simple hypothesis on the mean of a normal distribution with 
known variance. 


The T; test. The discussion of the asymptotic behavior of the 7; test for large 
n employs the notation: 


(1) 


(2) E(E,) = c =m, 


where E(z) stands for the expected value of z. The noncentral x’ variate involved 
in the 7’; test may then be written as 


(3) x” - 


with parameters \ = nm’ and n. 

The large sample behavior of the 7; statistic is summarized in the following 
theorem which follows as a direct application of the Lindeberg-Levy form of the 
Central Limit Theorem. 

Turorem. The noncentral x’ variate given by (3) is asymptotically N{n(1 + m’), 
2n(1 + 2m’)] as n approaches infinity. 

Thus, as n gets large, the x” distribution may be approximated by a normal 
distribution with mean n(1 + m’) and variance 2n(1 + 2m’) where m is given 
by (2). This suggests that for large n the cut-off point for the T; test criterion 
may be approximated by 


Xone = Za 2n(1 + 2m3) + n(1 + mi), 


where mp» denotes the value of m evaluated at uz = ou: . 


Similarly, for large n, the theorem enables the operating characteristic of the 
T; test at any uz > ous to be approximated by 


Br, = P{x” > zaV2(1 + 2m?2)n + (1 + ms)n} 
+ min . 20V/2( + 2mi) — (m" — ms)Vn\ 


V2(1 + 2m*)n ~ V2(1 + 2m) 
u {v2 + 2m) — (m' — oval 
V2(1 + 2m?) 
Hence, when n is large, the functional form of the T; test and of its operating 
characteristic is replaced by the normal function. The rate of convergence of 


(4) is slow and for that reason approximations to the noncentral x” suggested by 
Patnaik [3] or Abdel-Aty [4] are recommended in practice. 





JOHN 8. WHITE 
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A t-TEST FOR THE SERIAL CORRELATION COEFFICIENT 
By Joun S. WHITE 


Aero Division, Minneapolis-Honeywell Regulator Company, 
Minneapolis, Minnesota 


Summary. Let r be the sample serial correlation coefficient computed from a 
sample of size N drawn from a serially correlated process with parameter p. It 
is shown that the statistic 

_(r—p)VN+1 
i V1l-r 
is approximately distributed as Student’s ¢ with N + 1 degrees of freedom. 


Introduction. Let (z,) be a discrete process satisfying the stochastic difference 
equation 


Xe = plir + UW (¢= 1,2,---) 


where the u’s are NID (0, 1) and p is an unknown parameter. If, considering a 
sample of size N, we assume that ry,; = 2, then the distribution of the z’s is 
uniquely determined by that of the u’s and the z’s are said to be circularly cor- 
related. The parameter p is called the (circular) serial correlation coefficient 
and may be estimated by 


Leipnik [1] obtained the following as an approximate (say N > 20) distribution 
for r 


(1 ae gr 2 


1 N 9 ee so 
oN ate — a 


f(z) = —-— 
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The (¢-transformation. We shall now make the change of variable from r to ¢ 
in Leipnik’s distribution. This change of variable could be made in one step; 


however, it seems more appealing to make a series of preliminary transforma- 
tions. Let 


are sin 2, 


(sin z — p)/cos z 


Uu 


(x — p) VN +1 on 
ee my VN + I 
V1 — 2 
The density function for ¢ is then 


f(u) a 1 oe 


ape ue \ (W 
V/N #1 B(1 +a ) 


+2) /2 


pu 


_ es 78 OS 
(N + 1)B (1 + wey y/: +ywai-’ 
= (say) sw4i (u) + h(u), 
where B = B(1 / 2, [N + 1]/2). 


Applications. The function sy,;(u) is immediately recognized as the density 
function for Student’s ¢ distribution with N + 1 degrees of freedom. Since h(u) 
is an odd function, probabilities associated with absolute ¢ value may be read 


directly from a standard table of the ¢ distribution. For example a symmetric 
95% confidence interval for p will be of the form 


Sion i-- 
t f — < < t a 
, + ay +i p rt tm s/t 
where ¢.o25 and ¢975 are the 2.5 and 97.5 percentile points of the / distribution 
with N + 1 degrees of freedom. 
Probabilities not associated with symmetric intervals about the origin will 


require the evaluation of integrals of h(u). A basic probability which might be 
considered is 


Prob (a <t < #) = [9 du = i 8yai(u) du + [ h(u) du. 


The integral of sy4:(u) may be found from a table of the ¢ distribution and need 
not concern us here. The problem then is to calculate 


ee 


R(a) = | h(w) du. 


With some manipulation it may be shown that 


] N+1 1 
R(a) a 2p’ ("= +5). 


_ 
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where x = p{l + a’/(N + 1)]” and J,(4(N + 1), 3) is Karl Pearson’s notation 
for the Incomplete Beta-Function as tabled in [2]. 

In the preceding discussion it has been assumed that the mean of the process 
(a,) is known to be zero. If the mean must be estimated from the sample, the 
serial correlation coefficient will be 


(a, — #)(Xu1 — @) 


T = $$$ $$_____—_ 


~ (x, — =)’ 


t=1 


BI 


All of the results concerning r also hold true for r’ with N degrees of freedom 
rather than N + 1. 
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GROUPS AND CONDITIONAL MONTE CARLO 


By J. G. WENDEL 


University of Michigan 


Summary. The conditional Monte Carlo technique advanced by Tukey et al. 
{1, 2] has been explained in analytic terms by Hammersley [3]. This note offers 
an alternative explanation, wherein the group-theoretic aspect of the problem 
plays the dominant role. The method is illustrated on an example simpler than 
that treated in [1, 2]. 


The framework. Throughout this note X will be a random vector in euclidean 
n-space X, having distribution function G. F will denote a distribution function 
absolutely continuous with respect to G, with Radon-Nikodym derivative 
dF /dG = w, so that 


F(M) = i w(x) dG(zx) 


for all Borel sets M, and 


/ g(z) dF(2) = / o(x)w(x) dG(2) 


for Borel functions ¢g. It is standard in this situation to call w a weight and to 
say that X (drawn from G@) with weight w(X) is a sample from F; thus for Borel 


Ea(o(X)w(X)) = Er(y(X)) 
Received January 28, 1957; revised April 12, 1957. 
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where the subscripts on the expectation operators specify the population from 
which X is sampled. From now on we assume without essential loss of generality 
that F and G come from densities f and g, so that w(x) = f(x)/g(x) or 0 according 
as g(x) ~ 0 or = O. 


The problem. A more interesting case arises when conditional expectations are 
desired. It is by no means apparent that it will in general be possible to find a 
weight-function w* and an appropriate modification X* of X so that 


E,(e(X*)w*(X)) = E;(¢(X) | condition on X) 


identically in g, but in fact the main theme of [1, 2] was that highly non-trivial 
instances do exist. Their problem can be put as follows: 

%: Euclidean n-space; 

%: a locally compact non-necessarily Abelian group of 1-1 differentiable 
transformations acting on %, such that the mapping (a, xz) — 
ax(a e UA, x €& ¥) is measurable; 

0(ax)/dx: the Jacobian of a ¢ A at zx eX; 
dm(a): a fixed right-invariant Haar measure over ; 

A: a left-homogeneous mapping defined on almost all of ¥ onto A, so that 
A(ax) = a A(z) for all @ and all z in the domain of A; 

a: the density function of A, assumed to exist; thus, for Borel sets B C 


Pr {A(X) ¢ B} = i dd deaiked a / I,(A(2))f(x) dx 


where J, is the indicator of B, and X of course has density f. 

In [1, 2] the group & consisted of the multiplicative group of positive reals, 
acting as dilations on ¥; then dm(a) can be taken to be da/a, and the Jacobian 
is just a”. 

The problem is to express E,(g(X) | A(X) = ao) as an unconditional expecta- 
tion E,(¢(X*)w*(X)), where X is sampled from density g, X* is a suitable mod- 
ification of X, and w* is an appropriate weight. This is certainly natural in the 
Monte Carlo setting, for it would save us from having to waste most of our obser- 
vations, namely those X for which A(X) is not reasonably close to ao . 


Development of solution. (The formulae set down in this section are those of 
[1], but interpreted in the broader setting and subjected to formal! proof.) 

In view of the homogeneity of A the obvious choice of modification X — X* 
is to force the condition A(X*) = a» to hold. This will be achieved if we take 
X* = ayA(X)"X = aX, where a denotes aoA(X)™ and is, like X, a random 
variable. Finding the appropriate weights will occupy the remainder of this 
section. 

Lemma. Suppose that X with weight w(X) is a sample from density f. Then for 
each a ¢ A, aX with weight w(a, X) is a sample from f too, where 


wa, x) = w(x) {f(axr)/f(x)} | A(ax)/dz |. 
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Proor. Write Y = aX. We want to evaluate fy(y)f(y) dy for Borel ¢. But 
this is 


| olen an) den) | g(az)f(ax) | a(ax)/ax | de 
| elo) {f(2) g(x) } {f(ax) /f(x)}| A(ax) /dx | g(x) dx 
= [ elax)w(a) {f(ax)/f(x)}| A(ax)/dx | g(x) dx 


wo | olcit)wha, g(x) de 


as was to be shown. 

TuHEoreM. Let yp be an arbitrary density over A. For each a ¢ A suppose that 
aX with weight w(a, X) is a sample from density f. Then, with A(x) as above, 
a = aA(x)™ (so that A(ax) = ay) and X* = aX, we have X* with weight w* is a 
sample from f conditioned by A(X*) = ao ; the weight w* is given by 


w*(rz) = w*(ao, 2) = (a0) w(aoA (x) *)w(aoA (x), x). 


Proor. For simplicity write Y in place of X*. We want to evaluate 
E;(e(Y) | ACY) = ao) =net ¥(a0). ¥ is characterized (up to almost-everywhere 
equivalence) by 


() ——f ¥@)ala) dm(a) = | 


; i e(y)f(y) dy = / o(y)fly) dy 


Aly)e 


where ¢ = gl,-1,, Ba Borel subset of %. By hypothesis we have for all Borel 
¢ and each B ¢ Y, 


(2) | el(y)f(y) dy = | $(8x)w(8, x)g(x) dx. 


Multiplying both sides of (2) by u(8) and integrating over % with respect to 
dm(8) gives 


[ cow dy [ u(B8) dm(8) | #z)w(6, 2)9(2) dz 
(3) 
|, [, u@)e@a)w(e, x) dm(6)9(2) ax 


where the interchange of order of integration is justified by Fubini’s theorem. 
Putting 8 = aA(z)~ and invoking right invariance replaces the right side by 
(3) by 


| [ u(aA (x))p (aA (2)~2x)w(aA (x), 2) dm(a)g(x) de 
xu 
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= u(aA (ax) )e(aA (x) x) w(aA (x), x) dm(a)g(x) dx 
. x “ZB 


= | a(a) dm(a) o(a)p(aA(x)*)e(aA (x) 2) w(aA(x)™, x)g(x) dx 
JB /& 


= | a(a) dm(a) | w*(x)e(aA(x)'x)g(x) dx 
~“B z 


and the result now follows on comparison with the left member of (1). 


The solution. Combining the formulae of lemma and theorem shows that, for 
X drawn from G and weighted by 


w*(rx) = w*(ao, 2) 
= g(a) w(x) {f(aoA (x) *2) f(x)} | A(aoA (x) x) /dzx | (aA (x) i. 


the average of g(X*)w*(X) yields the desired conditional expectation, where 
X* = aX, with a chosen so that A(X*) = A(aX) = a. The arbitrariness of 
u May seem peculiar, but its role may be clarified by the following simple example. 

Examp.e. Let X be a scalar random variable with density f. It is desired to 
find E;(g(X) | X > 0) by sampling. To put this in the present framework let 
% = two-element group of numbers +1, —1 acting multiplicatively on %, the 
reals. Let A(x) = + or —1 according as x > or < 0, with A(0) arbitrary. Then 
the homogeneity property is clearly satisfied, and the problem amounts to finding 
E,(¢(X) | A(X) = +1). The Haar measure dm(a) over & is defined by placing 
unit mass at each of +1, —1. The density o is then 


eo 


o(+1) = p = Pr {A(X) = +1} = Pr{X>0} =] f(x) dz, 


“0 


o(—1) = 1 — p. 


For (a) we pick a number A between zero and one inclusive and set u(+1) = A, 
u(—1) = 1 — X. The weight w(z) is identically one, as is the absolute value of 
the Jacobian. Substituting all this information into the formula for w* we obtain 


w*(+1, 7) = w*(x) = (1/p){f(A(z) ‘x) f(x) }p(A(2) 2), 
[A/p if x > 0, 


i.e. w*(x) ‘ 
\(1/p) {f(—2)/f(a)} (1 — d) ifx <0. 


It is easy to verify directly now that E;(g(X*)w*(X)) = E;(e(X) | X > 0); the 
Monte Carlo procedure will be: observe values X = 2, %2,--+ , 2, from the 
density f; for each x; compute xt = | x; | and w*(x;) the appropriate expression 
from above, and use (1/n) de( | x; | )w*(a,) as the desired estimate. 

Naturally one would wish to choose \ so as to minimize the variance or—what 
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comes to the same thing in view of unbiassedness—some multiple of the mean 
square of the estimator. We find that 


p E({ew*}’) = »” | o(y) f(y) dy + (1 — »)° at o(y) f(y) dy 


= NJ, + (1 — Ae, 


say, which is minimized by setting 
A= J2/(Ji + J). 


In case f is symmetric J; = J, and optimum A = 1/2; here the naive procedure of 
rejecting negative x’s corresponds to A = 1 and maximizes the mean square! 
However, if f(—y) = 0 over a stretch in which f(y) > 0 then J, = ~, and we 
must take AX = 1, adopt the naive solution, in order to obtain finite variance of 
estimate. Finally, in case y(y) and f(—y)/f(y) have large similar peaks near some 
Yo > 0 then J; may be very much larger than J; and optimum \ very close to 0. 
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TABLES FOR TYPE A CRITICAL REGIONS 
By Harry WEINGARTEN 


Bureau of Ships 


1. This note provides tables connected with work by Neyman [1] and Johnson 
[2] on testing hypotheses, expanding the table given in [2]. This table, as expanded 
provides solutions for the values of A satisfying, 


1 20 « . ‘ 
= | | ef" dv du = 
2 oo J A—Bu? 


= .01, .05, and B = 0(.1)5, 5(1)10, 10(10)100. 
05 set A = 3.8414588B + pos, and when 
= 01 set A = 6.6348966B + p.m 
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TABLE 1 


Pim 


Zz 


P Pon 
09559 | .08721 
09277. | 08464 
09012 | 0.08222 
08761 .07993 
.08523 .07777 
.08299 07572 
08085 .07378 
.07883 .07193 
07690 | .07018 
07507 | 06851 
07332 | .06691 
.07165 .06539 
07006 .06393 
06853 | 06254 
06707 06121 
.06568 0.05994 


64484 .32634 
.37901 .79269 
15224 . 34283 
96047 99547 
79869 . 75054 
66460 | . 59093 
55679 | 48717 
.47283 41535 
40873 36237 
35972 .32153 
32153 28902 
29097 26252 
26590 . 24049 
24491 . 22188 
. 22704 20595 
21164 .19216 
.19821 .18011 06434 0.05871 
18640 . 16948 .06305 0.05754 
17593 . 16004 5. 05254 | 0.04794 
16657 .15160 04504 | 0.04108 
15817 14400 | 8. 03943 | 0.03594 
15058 13713 | 9. 03506 | 0.03194 
14368 .13088 | .03157 | 0.02874 
13739 | 12518 | 01591 =| 0.01432 
13163 | 11996 | 30. .01075 0.00949 
. 12633 11515 0.00821 | 0.00706 
12145 | 11072 0.00673 0.00559 
11693 10661 | 0.00576 0.00461 
11273. | 10280} 0.00510 0.00389 
10883 09925 | 0.00463 | 0.00335 
10519 | 0.09594 | 0.00429 0.00292 
.10178 0.09284 I 0.00401 | 0.00257 
09859 0.08994 
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This work was done on the Univac located in the Applied Mathematics Labora- 
tory, David Taylor Model Basin, Navy Department. 


REFERENCES 

(1) J. Nerman, “Tests of statistical hypotheses which are unbiased in the limit,’’ Ann. 
Math. Stat., Vol. 9 (1938), p. 69. 

[2] N. L. Jounson, ‘‘Parabolic test for linkage,’’ Ann. Math. Stat., Vol. 11 (1940),. p 227 


ost. 





RALPH G. STANTON 


A NOTE ON BIBDS 


By Raupu G. STANTON 
University of Toronto 


It is well known that the parameters (v, b, r, k, \) of a balanced incomplete 
block design (BIBD) satisfy the relations 


bk = rv, 
Av — 1) = r(k — 1), 
r—r=rk— wm. 
Fisher [2] proved that 
(3) b 2», 
and Bose [1] showed that for a resolvable BIBD one has 
(4) b20+r-—1. 
Nair [3] proved the inequality 


“ k(r — 1)° 
mr °S it ee Db 
for any BIBD, and 
(6) sit elie 

(r — k) + AK — 1) 
for a resolvable BIBD. While it was originally claimed that (5) and (6) are 
sharper results than (3) and (4), it is the purpose of this note to show that this 
is not so; (5) and (6) are completely equivalent to (3) and (4). 

We first put (4) in a neater form by writing it as (6 — r)k 2 k(v — 1); using 
(1) and (2), 

rv — Av — 1) —r2 k(v — 1), 


(v — 1)(r — A) 2 k(v — 1). 
Since v — 1 > 0, (4) is equivalent to 


(7) &. 


We now take Nair’s inequality (6); using (1), it is equivalent to 
v(r — k + Ak — d) = F(r — 1). Applying (2’), and the fact that » — k > 0, 
we then obtain 


k(r — 1)\(v — k) = Nv — &), 
k(r — 1) = YW, 
r—X 

This demonstrates the equivalence of (4), (6), and (7). 
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Finally, employing (1) and (2’), (5) is equivalent to rv(rk — Av — k + Ak) = 
k(\k — \v + kr® — kr). Grouping the terms in \, we have 


(8) kr(r — 1)\(v — k) = XA(v — k)or — k). 
If we apply (1) to (8), we get relation 

(9) rir — 1) 2 A(b — 1); 
however, applying (2’) to (8) gives 


kr? — drv > k(r — X) = kr — dbo, 


(kr — Xv\(r — k) = O, 
(r—A)(r —k) 2 O. 

It is trivial that r — A > O; hence 

(10) r—k20, 


which is equivalent, by (1), to Fisher’s inequality (3). Thus we conclude that 
(5) and (3) are equivalent. 

This completes the proof that inequalities (5) and (6) are in reality no more 
general than (3) and (4). 
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CORRECTIONS TO “THE SURPRISE INDEX FOR THE MULTIVARIATE 
NORMAL DISTRIBUTION” 


By I. J. Goop 


In the paper cited in the title (Ann. Math. Stat. Vol. 27 (1956), pp. 1130- 
1135): 

Sec. 1, line 4. For E read E; . (This was correct on some prints.) 

P. 1132, line 7. For Xo read X,, . 

Two lines above Sec. 4. For \ read \, . 

End of paper. The remark concerning Hotelling’s generalised “Student” test 
is misleading and should be deleted. 
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ABSTRACTS OF PAPERS 


(Additional abtsracts of papers presented at the Atlantic City Meeting of the Institute, 
September 10-13, 1957) 


1. Tests for Significance in Bivariate Harmonic Analysis, HAanotp HoTe.LinG, 
University of North Carolina, anp Donaxtp F. Morrison, National 
Institute of Mental Health. 


Detection of a common period in two or more observed variates, such as the radial veloc 
ity and brightness of a star or a pair of economic variates, may be undertaken by means of 
any of several generalizations of univariate periodogram analysis. Three such generaliza 
tions are considered in this paper. Two are forms of the multivariate analysis of variance; 
of these one uses the Wilks determinantal statistic, the other the Hotelling 7) . The third 
statistic, originated by the junior author, has a distribution whose large-sample approxi- 
mation is relatively easy to handle. (Received June 10, 1957.) 


2. Conditions that a Stochastic Process be Ergodic, EMANUEL PARZzEN, Stan- 
ford University. 


In his thesis on statistical inference on stochastic processes, Grenander has pointed out 
that “the concept of metric transitivity seems to be important in the problem of estima 
tion of a stationary stochastic process.’’ In this note, we give necessary and sufficient 
conditions in terms of characteristic functions that a strictly stationary (discrete or con- 
tinuous parameter) stochastic process X(t) be metrically transitive or ergodic. More 
importantly, we state a mean ergodic theorem (or weak law of large numbers) for sto 
chastic processes which are strictly stationary of order k, by which is meant that for every 
choice of k points t; , --- t , the random variables X(t; + h), --- , X(t. + h) havea joint 
probability distribution which does not depend on h. With the aid of these theorems, one 
can readily establish the following theorems: If X(t) is a normal stationary process, a neces 
sary and sufficient condition for it to be ergodic is that its spectrum be continuous. If X (¢) 
is a linear stationary process, then it is ergodic. (Received June 13, 1957.) 


3. Testing Homogeneity of Means in the Presence of Heterogeneity of Vari- 
ance, JoHN GURLAND AND LLoyp ROSENBERG. 


A finite series representation for the distribution of statistics with a structure similar 
to that of the t-statistic is utilized in obtaining under simple restrictions the exact size of a 
test when variance heterogeneity is present. Further modification of the technique was 
utilized to obtain the exact power of the tests. Exact probabilities have also been compared 
with approximations based on an approximate numerator and/or an approximate denomi- 
nator in the ratios under consideration. The possibility of extending the techniques to the 
case of more than two samples is also considered. (Received June 21, 1957. 


4. Generalization of Steinhaus’ Results on Fair Division, Perer Newman, 
University College of the West Indies. 


In Econometrica, 1948, H. Steinhaus posed the problem of fair division. A non-homogene- 
ous object X is to be divided among n people, each of whom has a valuation function v; , 
assumed to be an increasing, bounded, countably additive, non-atomic, positive measure, 
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defined on some Boolean o-algebra S of subsets of X. Steinhaus asserted that there exists a 
partition Uji? £; such that, for each i, »)(E;) 2 vi(X)/n. This was proved by K. Urbanik 
(Fund. Math., 1954) who further showed that (i) provided that, for at least one pair i, j 
and one E ¢ S,v,(E)/vi(X) # v;(E)/v;(X), then there is at least one k such that ».(E:) > 
vx(X)/n; (ii) provided further that the measures are equivalent, there exists a partition 
Uiz! F; such that, for all i, v;(F;) > vi(X)/n. (“good division”’). It is shown by an elemen- 
tary and constructive proof that the condition that the v; be countably additive measures 
can be replaced, for all three results, by the condition that they be sub-additive set func- 
tions. If they are further assumed to be strictly sub-additive, assumption (i) above can also 
be dropped. (Received May 27, 1957; revised June 26, 1957.) 


5. Graphic Methods based upon Properties of Advancing Centroids, 8. I. 
Asxkovitz, University of Pennsylvania. 


The centroid is defined in elementary physics as the center of gravity or balance point of 
a composite mass. Centroids of sets of isolated points have found a number of applications 
in statistics. The fact that centroids can often be located quite readily by graphic methods 
has made them fairly useful. The simplest application is to the graphic determination of 
mean values. This can generally be carried out in a matter of a few seconds directly on the 
original graph, with a pencil and straightedge alone. An entire set of moving averages 
can likewise be drawn by the use of a single polygonal line, without any calculation. By 
considering combinations of unequally weighted points, methods have been developed for 
drawing the line of best fit according to the least squares criterion, again without computa- 
tion. The change in the least squares lines when new points are added can be worked out 
easily. The mean value and standard deviation of frequency distributions can also be de- 
termined entirely graphically. Other applications, for example, to rank correlation proce- 
dures, are being completed. (Received June 28, 1957.) 


6. On the Decomposition of Certain x* Variables, Rosert V. Hoge anp 
ALLEN T. Craic, University of Iowa, (By Title). 


Let Q = Qi +--+ + Qui + Qu, & > 2, where Q:, --- , Qe-1 , Qe are real symmetric 
quadratic forms in central or noncentral, stochastically independent or dependent, normal 
variables. Let Q, Qi , «++ , Qe-1 have central or non-central chi-square distributions with 
parameters r, @ and r; , 6;,7 = 1,--- ,& — 1, respectively where r and r; are the degrees 
of freedom and @ and 6; are the non-centrality parameters. It is proved that if Q; is a non- 
negative quadratic form, then Q; , --- , Qe-1 , Qx are mutually stochastically independent. 
It follows immediately from the mutual stochastic independence that Q, has a chi-square 
distribution with parameters r, = r — Di ‘ri, 6 = — Li 6. (Received June 28, 1957.) 


7. The Limiting Distribution of a Likelihood Ratio Test for the Serial Corre- 
lation Coefficient, Joun S. Wuire, Minneapolis-Honeywell Regulator 
Company. 


Let z, be a discrete Gaussian process satisfying the auto-regressive equation 2; — aZ:_1 = 
u, (t = 1, 2,3, -+-), where the u’s are NID (0, o*), |ja| < 1 is an unknown parameter and 
Z is a constant. It is shown that if \ is the likelihood ratio for testing the hypothesis H:a = 
a against the alternative hypothesis H’:a ~ a then —2 log A has a chi-squared distribu- 
tion with 1 df. This result also holds in the so-called explosive case; i.e., |a| >1. (Received 
June 28, 1957.) 
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8. Onthe (Nonrandomized) Optimality of Symmetrical Designs, J. C. Kierer, 
Cornell University, (By Title). 


Many commonly employed symmetrical designs such as Balanced Incomplete Block 
Designs, Youden Squares (in particular, Latin Squares), etc., are shown to be optimum 
among the class of non-randomized designs in situations where they are employed to test 
(Ho) absence of treatment effects. Letting V2 = covariance matrix of best linear estimators 
of treatment effects when design d is used, the optimality criteria considered are: (1) mini- 
mization of det Va ; (2) minimization of the largest eigenvalue of Va ; (3) maximization of 
the minimum power (over all tests and designs) on a fixed contour; (4) the accomplishment 
of (3) locally (i.e., to first order terms near Ho). Of these, (1) and (2) were demonstrated by 
Wald and Ehrenfeld, respectively, for the Latin Square. The optimal nature of (2) involves 
the tacit assumption that the F-test should be used, and since it is not generally true that 
that test achieves (3) or (4) for a fixed design, criterion (2) has less intrinsic meaning. (3) is 
generally difficult to verify because of the question of which test to use for each d. In 
many settings where there is an appropriately symmetric design, (1) implies (2) and (4); 
these last three criteria are verified in the cited examples. (Received July 2, 1957.) 


9. On the Non-optimality of Symmetrical Designs among Randomized De- 
signs, J. C. Krerer, Cornell University, (By Title) 


The following is the simplest example of a general phenomenon. Suppose X;; independent 
and normal with unit variance and mean u;(i, j7 = 1, 2). Consider the problem of selecting 
(before observation) exactly two of the X;; and using them to test Hotu: = we = 0 with 
size a. The “‘symmetrical’”’ design d selects (X,, , Xx) and uses the usual x? test with 2 
degrees of freedom. Let d’ select (Xi , Xi2) with probability 1/2 for each i and use the x? 
test with 1 degree of freedom, whichever 7 is chosen. It is shown that the power function of 
da’ is uniformly greater than that of din a neighborhood of Hy . This is the simplest example 
of a general phenomenon which persists for any number of populations and observations, 
whether or not the variance is known. In cases like those where Balanced Incomplete Block 
Designs, Youden Squares, etc., are usually employed to test that all contrasts (of treatment 
effects) are 0, the same phenomenon persists as a — 0. The results are also true for other 
distributions. (Received July 7, 1957.) 


10. On an Optimal Property of Variance-components Estimates, WrrRNER 
Gautscut, Indiana University. 


Recently Graybill and Wortham (J. Amer. Stat. Assoc., Vol. 51 (1956), pp. 266-268) have 
stated the following result for balanced designs: Among all unbiased estimates for a vari- 
ance-component, the standard estimate, as given by the method of analysis of variance, has 
uniformly smallest variance. The authors have sketched a proof which, however, is not quite 
complete in various aspects. This pap: r presents a general method of proving results of the 
above type with applications to various particular designs. The method consists in three 
steps: 

(i) In order to find a sufficient and complete statistic T for the variance-components, 
an orthogonal transformation is applied which reduces the observation vector y to a ‘‘canon- 
ical’’vector z with independent components. This involves finding the eigenvalues and 
eigenvectors of the covariance matrix >, . 

(ii) To avoid laborious transformations of quadratic forms, a lemma is given by means 
of which among all forms Q = z’Bz which are unbiased estimates for a variance-component, 
the form Q* with smallest variance is easily found. 

(iii) Q* depends only through 7 and thus according to Lehmann and Scheffé (Sankhya, 
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Vol. 10 (1950), pp. 305-340 and Vol. 15 (1955), pp. 219-236) has uniformly smallest variance 
among all unbiased estimates. No transformation backwards is needed, since Q* is seen 
to have the same distribution as the standard estimate in y. (Received July 8, 1957.) 


11. Quantization for Least Mean Squares Error, Stuart P. Luoyp, Bell Tele- 
phone Laboratories. 


A quantizing scheme for a real random variable X consists of a partition {Q: , Qo, °°: 
Q,} of the range of X together with a set {g: , ge, --- , q} of representative values. An ob 
servation falling in Q. is reported as ga, a = 1, 2,--- , v. With the number » of quanta 
preassigned, and the c.d.f. F of X given, one seeks the {Q.} and {ga} which minimize the 


mean squared quantization error > Sea(z — qa)* dF(z). Necessary conditions obtained 
are (1) ga is the center of mass [dF] of Q. ,a = 1,2, +--+ , vy and (2) modulo sets of measure 
zero [dF], the {Q.} are intervals whose endpoints bisect the segments between adjacent 
{qa}. Two trial-and-error methods for finding such {Q.} and {g.} are described. The non- 
sufficiency of conditions (1) and (2) is demonstrated. For suitably restricted F, asymptotic 
properties for large » are given. (Received July 8, 1957.) 


12. Tests of Multiple Independence and the Associated Confidence Bounds, 
S. N. Roy ann R. E. Bargmann, University of North Carolina. 


In this paper a test based on the union-intersection principle is proposed for over-all 
independence between p variates or p sets of variates with a multivariate normal distribu- 
tion. Methods used in earlier papers have been applied to invert these tests for each situa- 
tion and to obtain, with a joint confidence coefficient greater than or equal to a preassigned 
value, simultaneous confidence bounds on certain parametric functions measuring depar- 
tures from independence of variate 1 or the set (1) with variates 2, 3,--- , p or the sets 
(2), (3), --+ , (p); variate 2 of the set (2) with variates 3, 4, --- , p or the sets (3), (4), --- , 
(p); and so on. One of the objects of these confidence bounds is the detection of the ‘‘culprit 
variates’ in the case of rejection of the ‘‘complex’’ hypothesis of multiple independence; 
for the ‘‘complex’’ hypothesis is, in this case, the intersection of several more ‘“‘elementary”’ 
hypotheses of two-by-two independence. (Received July 8, 1957.) 


13. Confidence Bounds on the “Ratio of Means” and “Ratio of Variances” 
for Correlated Variates, S. N. Roy anv R. F. Porrnorr, University of 
North Carolina. 


In this paper confidence bounds are obtained (i) on the ratio of variances of a (possibly) 
correlated bivariate normal distribution, and then, by generalization, (ii) on a set of para- 
metric functions of a (possibly) correlated p + p variate normal distribution, which plays 
the same role for a 2p-variate distribution as the ratio of variances does for the bivariate 
case, (iii) on the ratio of means of the distribution indicated in (i) and, by generalization, 
(iv) on a set of parametric functions of the distribution indicated in (ii), which plays the 
same role for this problem as the ratio of means does for the bivariate case. For (i) and 
(iii) the confidence coefficient is any preassigned 1 — a, and the distribution involved is the 
central t-distribution, while for (ii) and (iv) the confidence statement is a simultaneous one 
with a joint confidence coefficient greater than or equal to a pre-assigned 1 — a. For (ii) 
the distribution involved is that of the central largest canonical correlation coefficient 


(squared), and for (iv) the distribution involved is that of the central Hotelling’s 7. (Re- 
ceived July 8, 1957.) 
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14. On Aggregation and Consolidation in Finite Substochastic Systems, I., 
Davin RosensBiatt, American University, (By Title). 


We call a system z(J — A) = w a finite substochastic system if the n X n matrix A is 
substochastic (row sums <1) and w is nonnegative; if w ~ @, the null vector, without loss 
of generality we take w to be a stochastic vector. A solution z of a substochastic system is 
called admissible if z is finite, nonnegative, but not null. An aggregation matriz C isann X r 
stochastic matrix with exactly one positive entry in each row, 1 S r < n. A weight matriz 
E isann X n diagonal matrix containing nonnegative entries on the diagonal. A consolida- 
tion of a substochastic matrix is an r X r matrix B(C; E; A) = (C’EC)~'C’EAC, where 
C’EC is regular, 1 S r < n. A consolidation B(C; E; A) is said to be representative for a 
given system z(J — A) = w in respect to an admissible solution @ if and only if 

(1 — A)C = 2C(I — B(C; E; A)). 
A consolidation is said to be canonical for a given system z(1 — A) = w if and only if it is 
representative in respect to all admissible solutions of the system. 

THEOREM 1: Consider two finite-dimensional stationary Markov chains characterized by 
{to ; Zegr = TALK =O, 1,---}, fyos Year = wxB| kK = 0, 1,---} and an aggregation 
matrix C such that yo = aC. A necessary and sufficient condition that yx = yC for all k and 
any Xo is that AC = CB. B is necessarily a canonical consolidation of z(I — A) = @, in- 
variant for all weight matrices T consistent with (C’EC) regular. (Received July 8, 1957.) 


15. On Aggregation and Consolidation in Finite Substochastic Systems, II., 
Davip RosEnBxiatr, American University, (By Title). 


Let C be an aggregation matrix. For any column of C containing more than one positive 
entry, the index set of the rows containing the positive entries will be called an aggregation 
set of C. For any finite stochastic matrix, we distinguish (exhaustively) between transient 
and ergodic indices and call any closed set of indices an ergodic set of indices. For con- 
venience, call all (transient or ergodic) indices connected (in the directed graph of the sto- 
chastic matrix) to indices of a given ergodic set, the associated indices of that set. THEOREM 
2: Let (I — A) = 6 be a stochastic system involving two or more ergodic sets of indices. Let a 
canonical consolidation B(C; E; A) exist for the system, E regular. The following then obtains: 
If there exists an aggregation set of C containing one or more associated indices of each of the 
ergodic sets in a collection {|H, ,--- , Hp ; p = 2} and containing at least one index of an 
ergodic set H; , then z;C = x,C (j, h = 1, --- , p) holds in the stationary stochastic vectors 
for these sets. Consequently, each index of every ergodic set in the collection {H,,--- , Hp} 
must be contained in some aggregation set of C. Moreover, all ergodic sete in the collection 
exhibit one or more indices in any given aggregation set, or none do. (Received July 8, 1957.) 


16. On Aggregation and Consolidation in Finite Substochastic Systems, III., 
Davip RosEnBLAtT?T, American University, (By Title). 


THEOREM 3: Let z(J — A) = 6 be a stochastic system. Let a canonical consolidation exist 
for the system for E regular and given C. Let the associated points of each ergodic set be repre- 
sented in at most one aggregation set of C. Then C determines an invariant canonical con- 
solidation of the system if and only if each aggregation set contains at least one ergodic indez. 
Let V, denote the dth aggregation set of an arbitrary aggregation matrix C; V4 contain- 
ing ma indices, d = 1, --- , g. Let u = D5. ma . Let (J — A)¢ denote the distinguished 
u X n submatrix of (I — A) with row indices belonging to aggregation sets of the n X 
(g + n — u) aggregation matrix C. Let f; denote the diagonal elements of a weight matrix 
E,j = 1,--- ,n. For convenience, we extend the preceding definitions to consideration of 
finite non-negative systems, r(aJ — A) = w, A and w nonnegative and a a positive scalar. 
THEOREM 4: Let z(al — A) = w be a nonnegative system with one or more admissible solu- 
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tions. A consolidation B(C; E; A) is a representative consolidation of the system in respect to 
an admissible solution £ of the system if and only if the following holds: Din Liev, 
(4; — Si Deey, te / Deev, Se) dig = 0 for all q = 1,---,(g + — u), where dig is the typ- 
ical element of (al — A)-C. This result permits treatment of aggregation and consolidation 
in a significant specialization of the von Neumann model of economic equilibrium. 
(Received July 8, 1957.) 


17. On Aggregation and Consolidation in Finite Substochastic Systems, IV., 
Davin Rosensiatt, American University, (By Title). 


TueoremM 5: Let A be a substochastic matrix of order n 2 3. Let a system z(I — A) = w 
have admissible solutions not all of which are positive only in a single fixed component. Any 
consolidation B(C; E; A) is canonical for the system if and only if A has the form 


A = al + pu, 


where U is stochastic with all rows identical, and a + 8 S 1. The present inquiry is partly 
motivated by phenomenological models of substochastic variety in mathematical eco- 
nomics and econometrics, e.g., certain interindustrial (input-output) models, intersec- 
toral trade or exchange models, models of macroeconomic stability. The underlying 
abstract conception of these models goes back to the stationary process of the Tableau Eco- 
nomique of Francois Quesnay (published 1758). In many of these models, the coefficients a;; 
of A are macrostatistical observables representing empirical economic “‘flows.”’ We are led 
to the following proposal in certain applied input-output contexts, in place of matrix inver- 
sion, consolidation, or both. Given zo(J — A) = wo , A substochastic and (J — A) regular, 
where 2» , A, and w» are empirically given. Required to find z for z(J — A) = w for given 
w. Assume w ¥ uu ,u scalar. Let Avo , A, denote the matrices for the corresponding sto- 
chastic systems. Compute the stationary stochastic vector Z(w) by use of the iterative 


scheme: 2,4:(w) = 245+ (k = 0, 1, 2, ---); 2 is computable from 2(w). AS is always 
convergent in practice. The present inquiry employs a graphtheoretic approach to prob- 
lems of aggregation and consolidation in linear systems. (Received July 8, 1957.) 


18. Bayes Acceptance Sampling Procedures for Large Lots, DonaLp GuTHRIE, 
Jr. AND M. V. Jouns, Jr., Stanford University. 


A lot of N items is accepted or rejected on the basis of a sample of fixed size n. The con- 
sequences of acceptance or rejection are appraised in terms of economic costs consisting of 
a cost of inspection, and a cost due to accepting or rejecting the lot. If the lot is accepted 
then the cost due to passing each uninspected item is proportional to a random variable 
associated with that item. This random variable is assumed to have a distribution which is a 
member of an exponential family over which an a priori probability distribution is defined. 
If the lot is rejected, the cost is proportional to the number of items in the uninspected 
remainder of the lot. Explicit asymptotic expressions are given characterizing the Bayes 
rejection procedures and sample sizes for large values of the lot size N. (Received July 10, 
1957.) 


19. On the Equality of the Variances of Several Univariate Normal Popula- 
tions and some Multivariate Extensions, R. GNANADESIKAN, University 
of North Carolina. 

Suppose we have independent random samples of sizes no , m , +-- , me , respectively, 
from N(u, o?), N(u1, 01), -:- , and N(x, oi) where the means and the variances are un- 
known. Using the heuristic union-intersection principle, a test is derived for the null hy- 
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pothesis Hy:oi = --- = of = o*, and some power properties of the test are studied. The 
associated simultaneous confidence bounds, with a joint confidence coefficient (1 — a), 
on o}/e?, --- and o¢/e? are then obtained. Next, for the corresponding multivariate problem 
of testing the null hypothesis of equality of dispersion matrices, i.e., Ho: 2i(p X p) = 

- = 2(p X p) = Z(p X p), a test is proposed and the associated simultaneous con- 
fidence bounds, with a joint confidence coefficient 2=(1 — a), on the characteristic roots 
e(2,27!, --- , e(2,2-"') are obtained. (Received July 11, 1957.) 


20. Further Contributions to Confidence Bounds on Multivariate Variance 
Components, 8S. N. Roy anp R. GNANADESIKAN, University of North 
Carolina. 


Under the general multivariate linear hypotheses model or Model I, for a restricted k- 
way classification (including the multivariate analogues of the usual complete and incom- 
plete block connected designs), k matrices, S; , --- , Ss , due to the hypotheses of equality 
of the row vectors of §;(m; X p) (fori = 1, 2, --- , k) and a matrix, So , due to error are 
obtained. Next, for the multivariate variance components model, the k sets &:(m; X p)’s 
(fori = 1,2,---+ , k) are treated as random components from k independent p-variate nor- 
mal populations N[u; , 2] (i = 1, 2,--- , &) while the error component is from N(0, =] 
with =; = of. Under this model, when the matrices So , Si,-+-+ , Se, obtained under 
Model I, satisfy the conditions for being distributed mutually independently in central 
pseudo-Wishart forms with appropriate degrees of freedom, then a set of simultaneous 
confidence bounds, with a joint confidence coefficient =(1 — a), are obtained on oj, --- , 
o2 and all the characteristic roots, c(Z), of the matrix =. Next, even when the matrices 
Si,--: , S, are not mutually independent, if they are independent of Sp , and if they all 
have central pseudo-Wishart distributions with appropriate degrees of freedom, then an 
alternate set of separate confidence bounds for the individual o's (i = 1,2, --- , &) are ob- 
tained with exact confidence coefficients. (Received July 11, 1957.) 


21. A Table of the Expected Value of the Quasi-range, H. Leon Harter, 
Wright Air Development Center. 


The rth quasi-range, w, , of a sample of n is defined as the range of (n — 2r) sample 
values, omitting the r largest and the r smallest. Symbolically, w, = 2z,_, — 24: , where 
Zi ,22,°*: , £2, are the ordered sample values. The expected value of the rth quasi-range 
for samples of n from the standard normal distribution N(0, 1) is given by the relation 


. m nr " ‘ siden : 
E(w,) = 2(r + 1) (, + i}e [} — &(2))'[} + &(z)|"7—'(z) dz, where 


(x) = (1+/2x)e-7" 2 


and @(z) = Spaz) dz. Tables of E(w,), accurate to within one in the sixth decimal place, 
are given for n = 2(1)100,r = 0(1)8. These tables were computed by numerical integration 
(trapezoidal rule), using the Burroughs £101 computer. The use of sample quasi-ranges in 
estimating the population standard deviation is discussed. (Received July 12, 1957.) 


22. A Generalization of the Discriminant Function Analysis (Preliminary 
Report), M. M. Rao, University of Minnesota, (By Title). Introduced by 
R. C. Bose. 


Let x, , x. be row vectors of p components each of the rth and sth individuals of two 
random samples drawn from p-variate normal populations specified by N(#(x), =) where 
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E (24) = Ean + ayilir + e's + Opiter and E(2xj.) — Ej + Bijlis + eee ad Brjtee(i, j = c 2, 
“++ p,r = 1,2,--- ,m:8 = 1, 2, +--+ mz) so that all components have regressions of the 
same, kth, order. The ¢’s are fixed variates; e.g., in biological data, such as growth, the re- 
gression is age trend. We shall also consider g samples instead of two. The problem is to 
derive a statistic to test the differences between the samples. In this paper tests for two and 
q sample cases are derived which on setting a’s and §’s zero, reduce to Fisher’s test of dis- 
criminant function (1938) and Wilks’ A criterion (see C. R. Rao’s book, 1952, p. 262), respec- 
tively. Also a set of discriminant functions for g samples is obtained simultaneously. We 
note, however, that the possible gC: = gq; say, discriminant functions eannot always be ob- 
tained (if g > 2). Only s of them will come out, where s = rank C, the hypothesis matrix; 
i.e., CE = 0, C(qi X r) and &(r X p) where r = g(k + 1) is the matrix of the parameters; 
for example, if q = 3,k = 2,thens = 2. An illustration is also considered. (Received July 
12, 1957.) 


23. Distribution of a Serial Correlation Coefficient Near the Ends of the Range, 
M. M. Srppiqu1, University of North Carolina, (By Title). 


If x; , -** , 2, are observations on a stationary time series at equal intervals of time and 
it is known that Hz, = 0 for t = 1, --- , nm, we may define a serial correlation coefficient 
with lag unity by r* = (2f°' aaeu1)/[(2P 22) (22 2i41)]. Assuming the observations to 
be distributed independently as N (0, 1) variates a geometrical approach suggested by Hot- 
elling (American Journal of Mathematics, 1939) is utilized to obtain the order of contact 
of the distribution curve at r* = +1. It is shown that if for a number fo in [0, 1] and close 
to 1, P(r* = ro) is expanded in a series of powers of (1 — ro) the first non-zero coefficient, 
I, , is that of the power (mn — 2)/2. Bounds on the value of J, are obtained to be 0.435 and 
0.638. (Received July 15, 1957.) 


24. Jacobi Polynomials and Distributions of Some Serial Correlation Coeffi- 
cients, M. M. Srpprqu1, University of North Carolina, (By Title). 


If y is a variate with range [c: , cz], where —1 S c: S c2 S 1, and the moment generating 
function of y, x,(t), can be written in the form 


x,(t) = e'2T at*FP(6 +k +1,a+8 + 2k + 1, 2), 


where a,8 > —1land F(a, b, z) is a confluent hypergeometric function, then, under certain 
conditions, the pdf of y is given by p(y) = f(y; a, BIZ bPS” (y)); here f(y; a, 8) = 
Cc(1 — y)*(1 + y)*, C = 2e81B(a + 1, 8 +1), PS” (y) is the kth degree Jacobi 
polynomial associated with f(y; a, 8) and b, is determined by a, . Slight modifications in 
the form of x,(t) and p(y) are necessary if the range of y is between 0 and 1. Let 1 , --- , 2, 


be independent N (0, o) variates. Defining serial correlation coefficients by 
re (Zr ritige)/(Zt 24), 


s = 1,2, --- , the moments of r, are used to express its moment generating function in the 
required form and hence an approximation to its distribution is obtained as a product of a 
beta distribution and a series of Jacobi polynomials. It is proved that the series is asymp- 
totic. By an easy generalization of this method, an approximation to the bivariate dis- 
tribution of r: and r2 is also obtained. (Received July 15, 1957.) 


25. Age-dependent Branching Stochastic Processes in Cascade Theory—II. 
Case of Transformation Probabilities a Function of Absorber Depth, 
W. Max Woobs, Stanford University anp A. T. Buarucua-Retmp, Uni- 
versity of Oregon, (By Title). 
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In this paper we consider a simple model for an electron-photon cascade in which the 
transformation probabilities are functions of the absorber depth. This model is developed 
within the framework of the Waugh generalisation of the Bellman-Harris process. In par- 
ticular, we assume qo(t) = 1 — 8 exp (—at), g2(t) = y exp (—at), q(t) = 1 — (qo(t) + qa(t)), 
e>0,05851,0 Sy SJ 1, where q;(¢) is the probability that 7 electrons will be formed 
when transformation takes place at thickness t. The first and second moments of the prob- 
ability distribution of the number of electrons in the cascade are obtained and their proper- 
ties discussed. An expression which gives the probability that the cascade will terminate is 
also obtained. (Received July 15, 1957.) 


26. An Extension of the Theory of Cumulative Frequence Functions, BERNARD 
J. Derwort, North American Aircraft Corporation anp Waupo A. 
VezEAU, St. Louis University, (By Title). 


This work provides an extension in three areas: (1) some of the known theory for fune- 
tions of one variable is extended to functions of two variables, (2) new theory for functions 
of one variable is developed, namely, a moment-generating function and cumulative semi- 
invariants, (3) nine classes of new cumulative functions of one variable are developed. 
(Received July 16, 1957.) 


27. Some Renewal Processes Related to Types I and II Counter Models, 
Ronap Pyke, Stanford University. 


The Type I and Type II counter problems (cf. W. Feller, ‘‘On probability problems in 
the theory of counters’’, Courant Anniversary Volume (1948), pp. 105-115) with arbitrary 
input and deadtime are studied. Let |X;}, |Y;} be independent sequences of independent 
indentically distributed non-negative random variables (i.e., independent renewal! proc- 
esses). Let So = 0, Sy = X1 + X2 + --+ + Xi(k = 1) and define recursively np = mp = 0, 
nm; = min {k > njia:Se > Yj + Sa;,} and 


m; = min {k > mji:Sk > Si + Ye, 4 = myii,+:>,k — 1}. 


The probability distributions of the n- and m-processes thus defined are obtained. Define 
Z; = S,, and V; = S»,(j 2 1). The Z- and V-processes thus defined are renewal processes 
associated respectively with Type I and Type II counter models. The distribution and 
characteristic functions for the Z-process are obtained explicitly. An integral equation 
determining the characteristic function for the V-process is derived. Other quantities 
connected with these processes are also studied. In particular, 


Prob {Z, + Ye S t < Ze4: for some k 2 1} 


and a similar quantity with Z replaced by V are derived explicitly as well as their limits 
as t — ©. Several examples are listed. A more general counter model proposed by Albert 
and Nelson (Ann. Math. Stat., Vol. 24 (1953), pp. 9-22) is also studied and its solution is 


given explicitly in terms of the solution of a corresponding Type II problem. (Received 
July 17, 1957.) 


28. Contributions to the Theory of Random Mappings, Brernarp Harris, 
Stanford University. 


A random mapping space (X, 7, P) is a triplet, where X is a finite set of elements z of 
cardinality n, 7 is a set of transformations 7 of X into X, and P is a probability measure 
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over r. If xe X and T « +, T*z is defined as the kth iteration of 7 performed on z, where 
k is an integer either positive or negative. If for some k 2 0, T*z = y, then y is said to 
be a kth image of z in 7’. The set of successors of z in 7’, Sr(z) is defined as the set of all 
images of z, i.e., Sr(x) = {z, Tz, --- , Tx}, which need not all be distinct elements. 
If for some k < 0, T*x = y, then y is said to be a kth preimage of z in 7. The set of all 
kth preimages of z in T' is P? (z) and Pr(z) = UL_, P?(z) is the set of predecessors of 
z. If there exists a m > 0, such that 7"z = 2, then z is a cyclical element in T and the 
set of elements z, Tz, --- , T™~'z, is the cycle containing z, Cr(z). If there exists a pair 
of integers k, 1, T*x = T'y, then c ~ y under 7’. The resulting equivalence classes are 
called the components of 7’. The author considers several choices of T, in each case choos- 
ing P as the uniform distribution over T, and computes the distributions of the number 
of elements in Sr(z), the number of elements in the cycle in the component containing z, 
the number of cyclical elements, the number of elements in Pr(z), and the number of 
components of 7’. (Received July 17, 1957.) 


29. Tabulation of the Trivariate Normal Integral, Preliminary Report, Grorce 
P. Srecr, Sandia Corporation, (By Title). 


Let F(h, k, m) = Sn - . dT (zx, y, 2; p2.pu.en), Where T(z, y, 2; p12.913.pu) is the tri- 
variate normal density function with zero means, unit variances, and correlation coeffi- 
cients pi2.pi3.pn - P(h, k, m) can be expressed as a sum of 3 univariate normal integrals 
plus the sum of 6 T-functions (the T-function has been tabulated by D. B. Owen, Ann. 
Math. Stat., Vol. 27, No. 4) plus the sum of 6 integrals of the form 


S(m, a, b) = (1/+/2x)J™. e-*** T(az, b) dz. 


The function S(m, a, b) has been tabulated by numerical integration (but not checked) 
for m = 0(.1)5.0; a = 0(.1)[+/ 25 — m*/10m], b = 0(.1)1.0. The method used for expressing 
the trivariate normal integral as a function of three variables applies equally well in ex- 
pressing the n-dimensional normal integral as a function of n variables. (Received July 
18, 1957.) 


30. Exact Probabilities and Asymptotic Relationships for Some Statistics from 
m-th Order Markov Chains, Leo A. Goopman, University of Chicago. 


Exact formulas are given for the joint probability distribution of the set of observed 
m-tuple frequencies (m 2 1) in an observed sequence {X; , X2, --- , Xw} from a (m — 1)- 
th order Markov chain with a denumerable number of states. Formulas are also presented 
for the conditional distribution of the set of m-tuple frequencies, given the set of n-tuple 
frequencies, in a sequence from a chain of order Sn — 1. If the chain is of order <n — 1, 
and has a finite number s of states, the conditional probability (of the m-tuple frequen- 
cies, given the n-tuple frequencies), when regarded as statistic computed from the 
observed sequence, is asymptotically equivalent to the joint probability (regarded as a sta- 
tistic) of a corresponding set of observed cell entries in a set of s*~! independent contin- 
gency tables with fixed marginal totals (each table has s"~-" rows and s columns), where 
independence in each table is assumed. Several simplified tests, related to standard tests 
of independence in contingency tables, are given for the null hypothesis H,_,; that the 
chain is of order n — 1 against the alternate hypothesis H,,_, . Results of P. G. Hoel (Bio- 
metrika, Vol. 41 (1954), pp. 430-439), P. Whittle (J. Roy. Stat. Soc., B, Vol. 17 (1955), pp. 
235-242), and R. Dawson and I. J. Good (Ann. Math. Stat., Vol. 28 (1957)) are generalized 
herein. (Received July 18, 1957.) 
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31. Asymptotic Distributions of Some Goodness of Fit Criteria for m-th Order 
Markov Chains, Leo A. Goopman, University of Chicago, (By Title). 


I. J. Good’s review, for Mathematical Reviews, of P. Billingsley’s article in Ann. Math. 
Stat., Vol. 27 (1956), pp. 1123-1129, proposed two conjectured generalizations to mth order 
Markov chains (m 2 0) of Billingsley’s results for zeroth order stationary chains. It is 
proved herein that the first conjecture is correct, while the second isn’t. Billingsley’s re- 
sults used the theory of finite dimensional vector spaces, while the generalization is proved 
herein by approximating the goodness of fit criteria by functions of statistics whose 
asymptotic distributions were derived in L. A. Goodman’s “Exact Probabilities and Asymp- 
totic Relationships for Some Statistics from m-th Order Markov Chains.’’ The generaliza- 
tion follows: Let {X:, X2,--- , Xw} be an observed sequence from a stochastic process 
where each random variable takes as values only the integers 1, 2, --- , s. Let fu be the 
observed frequency of the m-tuple u = (ui, ue,--- , Um). Let H, be the composite hy- 
pothesis that the process is a chain of order n(m +1 > n 2 0). Let H,, be any simple 
hypothesis within H, , and let ,, be the maximum-likelihood estimate of H’, . Let 


Von. = La [fu = fea iiee ’ 


where fu.n is the expected value of fy , given H, , in a new sequence of length N. Then, 
when H, is true, ¢n.. has asymptotically (V — ©) a distribution *7>"" K,.,)(z/A), where 
* denotes convolution, g(A) = s™~!—*(s — 1)?, and Kg,,)(z/A) is the x?-distribution with 
g(\) degrees of freedom. (Received July 18, 1957.) 


32. On the Bivariate Sign Test, Isapore BLumMEN, Cornell University. 


A test for the hypothesis that the median of a bivariate distribution is (u, v) is called 
a sign test if it is based on the direction of the n vectors from (u, v) to (x; , ys), where i = 
1, --- , n. The test proposed here is obtained by ranking the vectors according to the slope 
(yi — v)/(a; — w). The statistic used is v? = (vi + v2)/n, where », = Djiu a; cos (xj/n) 
and ve = Dj. a; sin (xj/n), a; is +1 according as the difference y; — v is positive or nega- 
tive, and the index j corresponds to the vector with the jth largest slope. The large sample 
distribution of this statistic is obtained and it is compared with a number of other sign 
tests in relative efficiency. (Received July 18, 1957.) 


33. The Telephone Trunking Problem (Preliminary Report), Hersert Scarr, 
The RAND Corporation, (Introduced by T. E. Harris). 


Customers arrive at a service point, with independent, identically distributed, inter- 
arrival distributions. There are N servers, each of which serves according to the same 
negative exponential distribution. The assumption is made that a customer departs im- 
mediately if all of the servers are busy at the moment of his arrival, so that no queue is 
formed. This model is solved for an arbitrary interarrival distribution, in the sense that 
an explicit formula is obtained for the probability distribution of the number of busy 
servers. In addition, a relatively simple formula is given for the expected fraction of the 
customers turned away. (Received July 19, 1957.) 


34. Asymptotic Independence of Tests of Parametric Forms of Cell Probabil- 
ities in the Analysis of Categorical Data, Eart L. Diamonp, University 
of North Carolina, (By Title). 
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This is a generalization of some results given by Mitra in ‘‘Contributions to the Sta- 
tistical Analysis of Categorical Data” (North Carolina Institute of Statistics mimeograph 
series No. 142). We start from a product of multinomial distributions of the form 


ts) = Tl [nio!II; p;i/Tl; ni;!] 


with Dj psy = 1; 70 = ining ++ in 57 = jye-+-jijtie (ride, - +4 (a subset of r, depending on 
the subscript set i2 --- tx); t2 € (r2),, wets f ini € (re-1) 4, 3te = 1,2, --- , re and 


ji = 1,2, --+, Sis-++ sfx 1,2,-+, S&S. 
We next consider the hypotheses Hj'?:pi; = fi} (01, --* , 6,,) subject to 
Gm (0:1, °** , 0) = O(m = 1,2, +++, a1 < ti) 


and H§”:pi; = fi (01 , «++ , 4) subject to g2(0; , --- , 61.) = O(m = 1, 2,--- , us < ts) 
where t; , tg < [total number of cells — total number of multinomial distributions]. Each 
hypothesis is a composite one in which the 6’s or the @’’s are the nuisance parameters and 
iP, £2, ¢®, and g are known functions. Necessary and sufficient conditions for the 
asymptotic independence of Hj” and H>” are derived, these conditions being extensions 
of similar conditions for more special cases discussed in Mitra’s paper. (Received July 
22, 1957.) 


35. Tests of Parametric Forms of Cell Probabilities and their Asymptotic 
Power in the Analysis of Categorical Data, Eart L. Diamonp, University 
of North Carolina, (By Title). 


This is a generalization of some material in a previous paper by the author (‘“‘Exten- 
sion of some results given by Mitra on ‘Statistical Analysis of Categorical Data’ ’’) pre- 
sented at the March, 1957, IMS meetings in Washington, D.C. We start from a product 


of multinomial distributions of the form ¢ = [] [no! II; p,3/Tl; ni;!] with >; piy = 1; 
i = ivin+++ tej = jije-++ jes tre (i),,...4, (@ subset of r; depending on the subscript set 


ie +++ th); t2€ (re)y...g 55° 5 tee (rea) cp 3 teh = «1, 2, +++ , re and ji: = 1, 2,---, Si; 
-3j: = 1,2, +--+ , Si. We next consider the hypothesis Ho: pi; = fi;(0: , --- , 0:) subject 


to gm(O:,--* , 0) = O(m = 1, 2, --- , u < t) against the alternative 
Ha:pi; = fig(Oi , +++ , Oc) + 0-85; 


subject to gm(@:,--- , 9) = 0, where ¢ < (total number of cells — total number of mul- 
tinomial distributions). The hypothesis is a composite one in which the 6’s are the nui- 
sance parameters and f;; and g», are known functions. Tests are given for hypotheses anal- 
ogous to the hypotheses of ‘‘no partial correlation,’’ ‘‘no multiple correlation,” 
‘no canonical correlation,” and ‘“‘complete independence” in multivariate analysis, and 
analogous to the hypotheses of ‘‘no block effect,’’ ‘‘no treatment effect,”’ and ‘‘no block 
or treatment effect’”’ in analysis of variance. The asymptotic power of each test is derived. 
(Received July 22, 1957.) 


36. Tests of Functional Forms of Cell Probabilities and their Asymptotic Power 
in the Analysis of Categorical Data, Eart L. Diamonp, University of 
North Carolina. 


This is an extension of some results given by Mitra in ‘‘Contributions to the Statistical 
Analysis of Categorical Data’’ (North Carolina Institute of Statistics mimeograph series 
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No. 142) and amplified by Ogawa in “On the Mathematical Principles Underlying the 
Theory of the x? Test’? (North Carolina Institute of Statistics mimeograph series No. 
162). We start from a product of multinomial distributions of the form 


@ = Whi (mio! I; p.?/T1; nis‘) 


with Dj pij = 1; ¢ = trie ++ te 57 = jaja s+ Gry tre (rm), ;, (a subset of rm: depending 


on the subscript set t2--+ tx); t2 € (r2),, 577° ten (ren); 5 te = 1, 2,--+, re and 


yn = l, 2» ome, S, soe is = 1, a, ree, S, ° We next consider the hypothesis 
Ho:fm(pij’s) = O (m = 1, 2, --- , t) 


with ¢ < (total number of cells — total number of multivariate distributions), against 
the alternative H»:fm(pi;’8) = n-'/5,, . A test is given for hypotheses of this form and the 
asymptotic power of this test is derived. As an example, the case in which 


fm (pi;’s) (for m= 1,2,---, ¢) 


are linear is developed in detail. (Received July 22, 1957.) 


37. Asymptotic Normality and Efficiency of Certain Nonparametric Test Sta- 
tistics, HERMAN CHERNOFF AND I. RicHarp SavaGe, Stanford University. 


Let X:,---, Xm and Y,,---, Yn be observations from the continuous cumulative 
distribution functions F(z) and G(x) respectively. If ziy = 1 when the ith smallest of NV = 
m + n observations is from F and ziy = 0 otherwise, then many nonparametric test sta 
tistics are of the form T = Y*; Eywzin. Theorems of Wald and Wolfowitz, 
Noether, Hoeffding, Lehmann, Madow, and Dwass have given sufficient conditions for the 
asymptotic normality of 7. In this paper we extend these results to cover more situations 
with F # G. In particular it is shown for all alternative hypotheses that the Fisher-Y ates, 
Terry-Hoeffding c:-statistic is asymptotically normal and the test for translation based 
on it is at least as efficient as the t-test. (Received July 22, 1957.) 


38. Effects and the Classical Analysis of Variance Mixed Model, Mary D. 
Lum, Wright Air Development Center. 


Consider the two-factor mixed model z;j- = M + A; + b; + (Ab)i; + es; , 
G@=mil,---  J;j =21,---,J;r=1,-:-,R), 


where M, A; are constants; b; , (Ab)i; , e:j, are independently normally distributed with 
zero means and constant variances o} , 04, , ¢: , respectively. Besides the effects b; , (Ab);; 
one can alternatively consider the effects 8; , y;;: 8; = 6; + (Ab).; ,y¢; = (Ab)ij — (Ab); , 
where (Ab).; = Bin (Ab);;/I. The effects y;; are subject to linear constraints 2 vii = 
0; nevertheless the mean squares are distributed as Chi-squares and F-tests are valid. 
The experimenter is usually more interested in the effects 8; , y;; rather than b; , (Ab) ;; ; 
though it may also be desirable to investigate the latter. An F-test of the hypothesis 6; 

0 involves the mean square for error as the appropriate error term, whereas that of the 
hypothesis b; = 0 involves the mean square for interaction. It is thus shown that the suit- 
ability of any mean square as an error term simply depends on the particular effect in 
which one is interested. The same argument can be extended to the general n-factor mixed 
model. (Received July 22, 1957.) 
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39. Equally Spaced Levels for Multi-level Continuous Sampling Plans (Pre- 


liminary Report), DonaLtp GuTuriz, Jr. AND M. V. Jouns, Jr., Stanford 
University, (By Title). 


The tightened multi-level continuous sampling scheme proposed by Derman, Littauer, 
and Solomon (Ann. Math. Stat., Vol. 28, No. 2, June, 1957) calls for sampling at levels 
1,f/,f*, --- , f*, with k possibly infinite. An alternative scheme proposed in this paper calls 
for sampling at levels 1, f, f/2, --- ,f/k. If k = ©, then for plans with levels 1, f, f?, --- , 
all levels of inspection correspond to »ull recurrent states in the Markov chain describing 
the process for p less than the AOQL. In the scheme discussed here all levels of inspection 
are ergodic for all values of p. In this respect the proposed plan gives better protection 
against a sudden deterioration of quality. For k = 2, 3, 4, 5, the plans are compared on 
the basis of expected first passage time to 100% inspection if p goes from p: (good quality) 
to p2 (bad quality) for fixed AOQL, and average fractions inspected equal at p; and pe. 
In addition, for both types of plans, contours of constant AOQL are given for k = 2, 3, 
4,5, ©. (Received July 22, 1957.) 


40. Extension of the Mann-Whitney “U” Test to Samples Censored at the 
Same Fixed Point, Max Hauperin, National Institutes of Health. 


Suppose we have random samples of size m and n from populations with continuous 
cumulatives, G and F respectively. Denote an observation from G by y and from F by z. 
Let both samples be censored to the right at the same fixed point, z = 7, y = T. A sta- 
tistic, U., is defined, which is the sum of (1) The usual U statistic, as defined 
to test against the alternative F(z) > G(z), all z, computed for the uncensored elements 
of both samples, and (2) The product of the number of uncensored y’s and the numbe' * 
censored z’s. It is shown that, when F = G, —« < z S T, and for the total number of 
censored elements in both samples fixed at the total number observed, say r, the distribu- 
tion of U, is independent of the specific nature of F, and U., properly standardized, is 
shown under appropriate conditions on m, n, r, to have an asymptotically normal distribu- 
tion. A test of the null hypothesis, F = G, —« < z S T, based on U, is proposed and 
shown to be consistent against the alternative, F(z)/F(T) > G(z)/G(T), F(T) > G(T), 
—« <2 < T. This alternative implies F(z) > G(x), —» < 2 S T. (Received July 22, 
1957.) 


41. On some Distribution-free Bias Properties of the Latent Roots of Real 
Symmetric Random Matrices, H. Ropert vAN DER Vaart, University 
of North Carolina, (By Title). 


If the probability distribution of the k X k real symmetric random matrix F is con- 
tinuous and satisfies &(F) = #, then &() < 1, &() > a, &( — i) > We — A, and 


E(Dher ty) = Dharm. 


Under the same conditions Med (l,) S A: , Med (lx) 2 Xx , whereas for many common dis- 
tributions of F these inequalities regarding the medians are strict. In these statements 
random matriz means matriz of random elements; the probability distribution of a matriz 
stands for the joint distribution of its elements; the expectation of a matriz stands for the 
matrix of the expectations of its elements; 1; S 1: S --: S i are the latent roots of F (hence 
random variables) arranged according to increasing magnitude, while \; S Az S --- S dA, 
are the latent roots of ® arranged in a similar way. Denote by A the diagonal matrix con- 
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sisting of the elements A; , Ax , «+ , \» and by L the diagonal matrix consisting of the ele- 
ments 1, , le, --- , i. . Then the proof is based on the existence of orthogonal matrices Y, 
U and V such that @ = YAY’, F = ULU' and YV = U. Here Y is a matrix of parameters 
and U and V are matrices of random variables. (Received July 22, 1957.) 


42. On the Distribution of the Latent Roots of Real Symmetric Random Ma- 
trices with Multinormally Distributed Elements, H. RoBpert van pDER 
Vaart, University of North Carolina. 


Let the $k(k + 1) random elements 
fis(ij = 11, «++ , 1k, 22,--- , 2k, --- ,kK —1Lk—1,k —1k, kk) 


of a k X k real symmetric random matrix F be multinormally distributed with &(f,;) = 
vi; and &(fi; , fog) = Gij.p¢ - Then in order that the joint distribution of the latent roots 
lL sts.--- Ss i of the matrix F depends only on the various o4;,p,-values and on the 
latent roots \1 S \2 S ++: S x of the matrix = || ¢; || , and not on the elements of any 
orthogonal matrix Y with @ = YAY’ (the elements of Y would be, in a sense, nuisance 
parameters), there is an interesting necessary and sufficient condition on the o;;,)¢-values. 
Furthermore, in case k = 2 a number of results on the joint distribution of 1; and l. (evi- 
dently depending on the matrix of o;;.p¢-values) are presented. They regard both the 
amounts of bias of the 1; as estimators of the \; and their variances and covariances. (Re- 
ceived July 22, 1957.) 


43. Bias in Certain Current Procedures of Response Surface Estimation, H. 
ROBERT VAN DER VAART, University of North Carolina (By Title). 


Be ~ and 6 real k X 1 matrices, @ a k X k real symmetric matrix, 7 a real scalar vari- 
able of which 7 is a certain value; &, $, @ and n are non-random quantities. The equation 
of any quadratic response surface has form 7 — m9 = @’E + &'@&. Be the k X k real sym- 
metric random matrix F continuously distributed with &(F) = #, then F is currently used 
as an estimator of #, the type of response surface being estimated from the latent roots 
of F. The latter estimation method is biased: one will estimate unduly often that the sur- 
face is saddle-shaped when in fact it has a minimum or maximum. Another corollary from 
the distribution theory of latent roots is that generally the variances of the ‘‘canonical 
quadratic effects’’ (i.e., the latent roots of F) are different from the variances of fi: , fo , 
ete., even with second order rotatable designs. These designs do have the gratifying prop- 
erty that if the elements of F are multinormally distributed the joint distribution of the 
latent roots of F does not depend on the nuisance parameters represented by the elements 
of any orthogonal matrix Y with @ = YAY’. Here A is the diagonal matrix consisting of 
the latent roots of &. (Received July 22, 1957.) 


44. On the Numerical Computation of Certain Multivariate Normal Integrals, 
H. RoBERT VAN DER VAART, University of North Carolina, (By Title). 


Be y and n X 1 matrix, C a real symmetric positive definite n X n matrix with ci; = 
1(i = 1,--- , n). Consider (1): | C \~*. fexp (—4y’C-y) dy, --- dyn , where integration 
is over y; > Of = 1, --- , n). If C is a Jacobi matrix (i.e., c;; = 0 for |i — 7 | > 1) the 
integral (1) reducse toa sum of integrals (2): f |E \~*dere dex, +++ dém-im , With m Ss [4n}, 
m even, where E represents any principal minor matrix of C with row indices 


a1 < 12 Keer < hed < bon ° 
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and integration is over 0 < é12 < cii,, +: > O < mim < Cin sin - If m = 4, then (2) is 
at most a double integral, which is for most sets of c,;;-values numerically computable 
to a for statistical purposes very satisfactory degree of accuracy by a nine-point integra- 
tion method (equivalent to Gauss’ three-point method for single integrals applied twice), 
or even—with less accuracy—by a five-point method which is not equivalent to any uni- 
variate method applied twice. The simplicity (greatly enhanced by certain symmetry 
properties of | Z | with respect to c;; = 0) and accuracy of these methods make numerical 
computation of certain multinormal integrals very easy. Generalization to m > 4 is im- 
mediate. If integration in (1) is over yi > wo + O(@ = 1,--+, n) or if C is 
no longer a Jacobi matrix, essentially the same methods apply, though without the short- 


cuts made possible by the above-mentioned symmetries now being destroyed. (Received 
July 22, 1957.) 


45. Birth and Death Random Walk Process in s Dimensions, J. NEYMAN AND 
Exvi7aseru L. Scorr, University of California. 


Consider two finite sequences {a;} and {b;} of Borel sets in s dimensional Euclidean 
space R, . The a; are disjoint ‘“‘cells’’. The 6; are disjoint ‘“‘regions’’. Consider dimension- 
less particles (‘‘organisms’’, male or female) located in R,. For 0 S 7: S T:, symbol 
6(T: , T:) denotes conditional probability that an organism aged 7; will live to be T2. 
At times t = 0, 1, 2, --- each surviving female gives birth to a litter composed of a ran- 
dom number » of organisms of next generation. # is the probability that an organism born 
is female. All variables » are identically distributed, mutually independent and independ- 
ent of ali other random variables of the system. Given that an organism survives up to 
time 7: and given that at time 7; < 7:2 it is located at X(7:) = 2, e R, , the function 


S(ai, am, T; ’ T'2) 


represents the conditional density at z:¢ R, of X(T), the position of this organism at 
time 7; . All organisms random walk independently. For t 2 0 and for Borel setc C R,, 
the symbols a(c, t), 8(c, ) represent the numbers of males and females of the ancestral 
generation born at ¢ = 0 in the given sequence j{a;} of cells, and y(c, t), 4(c, ¢) 
those of male and female descendents of ancestors born in {a;}, all alive at time ‘ and at 
that time located in c. The results obtained concern the joint distribution of n quadruples 
a(b; , t), 8(b; , t), y(b; , t), 6(0; , t) corresponding to an arbitrary set of n regions {b;}. This 
distribution is expressed in terms of the distribution of {a(a; , 0), 8(a; , 0)} and in terms 
of unspecified 6, @ and f. Applications include astronomy, ecology, and radioactive phe- 
nomena. (Received July 24, 1957.) 


46. On Convergence of Distribution Functions and of Moments of Order 
Statistics, MANDAKINI Rowatei, University of California, (By Title, 
introduced by J. Neyman). 


Let Yin < You < --+ < Yon be the s smallest order statistics out of a sample of size n 
from a distribution F(z). It is assumed that there exit constants a, > 0 and b, such that 
Yin = (Yin — bn)/an has a limiting distribution that is nondegenerate as n tends to in- 
finity. Then Yin < --- < Yen, where Yin = (Yin — bn)/an, have a joint limiting distri- 
bution. Furthermore the conditional distribution function of Yin < --- < Yin, given 
Yin, has a limit as n tends to infinity, which is a distribution function, and the condi- 
tional distribution function of Yin < --- < Yée-1yn, given Ysn , has a limit which is also 
a distribution function. If n is a random variable N whose distribution depends on a pa- 
rameter y and if N/y tends to unity in probability, as 7 tends to infinity, then there exit 
constants a, and 8, such that (Yiy — 8,)/a, has a limiting distribution as y tends to in- 





1072 ABSTRACTS 


finity and a, and 8, can be taken to be a, and b, respectively. In the special case when 
F(z) is a Normal distribution function, the first moment of Yi, converges to the first 
moment of the limiting distribution, the second moment of Yj, is bounded for large n 
and the third moment of Yi» diverges. (Received July 24, 1957.) 


47. Best Unbiased Tests of Composite Hypotheses with s Constraints, MANDa- 
KINI Ronatel, University of California. 


Let X be a random vector with probability density depending on s + k parameters 
Ei, +++, &e, O01, °*+ , O& = (E, 6). Hypotheses considered are H,:& = & , 6 unknown, and 
Ho; = & = +++ = & = £, — unspecified, @ unknown. Locally most powerful unbiased tests 
with constant ‘power on ellipsoids are derived for testing H; and H: , under a set of as- 
sumptions similar to those made in the Neyman-Pearson theory of testing hypotheses. 
As an example, a test for equality of variances in s Normal populations is given. In the 
case when the sample sizes are equal, the locally most powerful test with spherical power 


v4 t 


surfaces is Dizi St = , where 2 is a function of Dizi Si . (Received July 24, 1957.) 


48. On the Asymptotic Distribution of the Likelihood-ratio in some Mixed 
Variates Populations, J. Ocawa, M. D. Moustara ayn 8. N. Roy, Uni- 
versity of North Carolina, (By Title). 


Let the likelihood function of the population under consideration be P(X | Ho) and 
P(X | H) under the null-hypothesis H and the alternative hypothesis H respectively, 
then it is well known that under certain conditions the random variable 


—2 log \ = —2 log [max P(X | Ho)/max P(X | H)|) 


has the x?-distribution with suitable degrees of freedom in the limit as n, the sample size, 
tends to infinity, provided the null-hypothesis H is true. 8. 8. Wilks (1939) got this result 
based upon J. L. Doob’s work (1934). Later (1943) A. Wald obtained the same result start- 
ing from somewhat stronger assumptions. However, as far as the authors are concerned, 
they have never seen any complete proof along the Wilks’ line published so far. In this 
note, the authors are concerned with the asymptotic distributions of —2 log X for testing 
various kinds of null-hypothesis in certain mixed variates populations. For that purpose 
the authors will present a complete proof of the above mentioned proposition, and then 
the validity of Doob’s assumptions was verified in each case which was of the authors’ 
main concern. (Received July 24, 1957.) 


49. On Test of a Certain Hypothesis based upon Selected Sample Quantiles, 
J. OGawa, University of North Carolina. 


The author reported on the estimation of the location and scale parameters based upon 
the selected sample quantiles, and determined the optimum spacings for the normal and 
exponential distributions. The author will present here the theory of testing a certain 
hypothesis and show that the optimum spacings for estimation purpose turn out to be the 
spacings which give the greatest powers for testing purpose. (Received July 24, 1957.) 


50. Run Tests and Likelihood Ratio Tests for Markov Chains, Leo A. Goop- 
MAN, University of Chicago, (By Title). 


This article first discusses some runs tests as tests of randomness in a single sequence 
of alternatives, where the number of kinds of alternatives is s and where the sequence is 
long. Simple derivations of some long sequence run tests are given by making use of some 
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results concerning the asymptotic distribution of the observed transition numbers in a 
sequence from a Markov chain. Since the asymptotic distribution of the transition pro- 
portions is related to some standard asymptotic results for multinomial trials, a close 
relationship is observed between certain asymptotic results in the distribution theory of 
runs and in the standard distribution theory for multinomial trials. Large sequence runs 
tests are presented for certain generalizations of the null hypothesis of randomness and 
for certain alternate hypothesis concerning Markov chains, and the asymptotic distribu- 
tions are obtained under both null and alternate hypothesis; thus, generalizing some 
standard results in the distribution theory of runs. Runs tests for the special case s = 2 
are studied in detail. Some simplified likelihood ratio statistics for testing generalization 
of the null hypothesis of randomness and hypotheses concerning the order of a Markov 
chain are studied in detail and compared with other statistics that have been suggested 
in the literature. In the errata to his paper in Biometrika, Vol. 42 (1955), pp. 531-533, I. 
J. Good refers the reader to the present work; some of the statistics mentioned 
in his paper have been studied further herein and a number of inaccuracies in his paper 
have been corrected. (Received July 25, 1957.) 


51. Further Results in Testing of Hypotheses on a Multivariate Population, 
some of the Variates being Continuous and the Rest Categorical, M. D. 
Mousrara, University of North Carolina, (By Title). 


Consider a multi-way table such that certain ways refer to continuous variates and the 
other ways are categorical. For certain problems all the categorical ways refer to variates; 
for certain other problems all of them refer to ways of classification; and for some prob- 
lems some of the ways refer to variates and the rest to ways of classification. The author 
assumes that the conditional distribution of the continuous variates, given the categorical 
variates, is a multinormal distribution; and in case some of the categorical ways are ways 
of classification, the conditional distribution of the continuous variates will be a set of 
independent multinormal distributions. For such multi-way tables, tests for hypotheses 
like, say that of conditional independence or joint independence or total independence, 
etc., are formulated. Considering large sample tests, the statistic used is the —2 log \ 
statistic, which, in each of these situations, is shown, in another paper, to have asymp- 
totically the x?-distribution; but to adopt it to this study, the fact that some of the vari 
ates are categorical should be noticed. The author suggests a statistic which is algebrai- 
cally simpler, more convenient and is asymptotically equivalent, in probability, to the 
—2 log statistic when the latter is calculated directly from the likelihood ratio. (Re- 
ceived July 25, 1957.) 


52. Random Walks in the Plane with General Absorbing Barriers, M. V. Jouns, 
Jn., Stanford University. 


Let Y; ,j = 1, 2,--- , be independent zero-one random variables with 
Prob {Y; = 1} = p. 


The points (S, , n), n = 1, 2,--- , where S, = Dj. Y,, describe the path of a random 
walk. An absorbing barrier may be characterized by a non-decreasing sequence of positive 
integers a , @2 --- , where absorption takes place at the nth stage if 


8S; 3S a;,j = 1,2,---2—1, 


and S, > a, . Without loss of generality it may be assumed that a;.; S a; + 1, for all j. 


The probability of attaining the point (k, n) without prior absorption is shown by an ele- 
mentary argument to be 
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where the y’s are determined recursively by 


ts am — 1 
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where yo = 1 and where the a’s are determined as follows: a; = j,j7 = 1, 2, +--+ jo, where 
jo is the smallest j such that a; = j; a; = min {i:a; = j}, 7 > jo. A similar expression is 
obtained for the case of a random walk between two absorbing barriers. Application of 
these results yields an explicit solution of the classical ‘‘gambler’s ruin’’ problem where 
the number of plays is finite and the stakes risked at each play are fixed Sut not neces- 
sarily equal. (Received July 26, 1957.) 





53. Mathematical Developments in the Theory of Human Lethal Dose, (Pre- 
liminary Report), CLirrorp J. MaLoney, Fort Detrick. 


A number of two parameter families of curves (probit, logit, sinit) and one one param- 
eter family (exponential response curve) have been proposed to express the relation be- 
tween biological response and intensity of a deleterious agent. Two measures of response 
in the case of disease agents are (1) sickness and (2) death. It was observed (Maloney, 
Proceedings of the Second Army Conference on Design of Experiments, 1956) that, provided 
morbidity and mortality are related to dose level by the same form of curve, and since 
all dead individuals had necessarily been ill, that the morbidity curve must always lie 
above the mortality curve when the two are plotted on the same graph, and hence can 
be used to infer one parameter of the mortality curve from those of the morbidity curve. 
The problem is so important that a search for minimum conditions for the validity of the 
conclusion is appropriate. The present paper considers the consequence of some relaxa- 
tion of the assumption that the curve relating mortality to dose is a member of the same 
two parameter family that relates morbidity to dose. (Received July 26, 1957.) 


54. A Matrix Definition of the Correlation between Two Sets of Variables, 
Anpre G. Laurent, Michigan State University, (By Title). 


Let X, , X2. be p X 1 and g X 1 random vectors, p 2 g, and X = (X, , X2)’ be N(O, a) 

with covariance matrix o = (¢;;),7 = 1,27 = 1, 2, (all matrices non singular) ; let 
ois = O40; 
where ¢; is triangular. Implicit in and consistent with Hotelling’s definition of canonical 
correlations is the intuitively ‘“‘natural’’ generalisation of the correlation between X, and 
X, as a matrix, namely P = o3'o20;~! which yields the covariance matrix of X:, given 
» on! ° . 

X, , as o2(I — PP’)o, . The squares of the canonical correlations are the roots of 


| Ip? — PP’ | = 0. 


o = Diag. (o;) R Diag. (o;)’, where ® is the generalised correlation matrix 
IP’\ . 1D, 
(nr) ®=(074), 
where D, = Diag. (px), & = 1,--- , p, is the canonical correlation matrix and ® and D 


are ‘“‘equivalent”’ in that sense that there exists A such that 


AGA’ = DV, A = Diag. (Ai , Aa), 
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A, orthogonal. If X is N(O, J) and R is the “‘correlation’’ based on sample values, i.e., 
R= S;'Su8;", the distribution of R is a, | I] — RR’ | ‘*-»-«-*/2 dR, with 


sg aoaia (" : ‘) a r("S t) / (2-2-4) ven r(* Ea), 


A scalar correlation coefficient between X, and Xz can be obtained by means of a proper 
scalar function of RR’. In case X = (X, , Xz , X3)’, similar generalisations of the multiple 
and the partial correlations, (starting from conditional distributions) yield the identity 
UI — PseyPaey) = U — PraPa)U — PnP). (Received July 29, 1957.) 


55. On Ranking Parameters of Location and Scale in Continuous Populations, 
K. C. Sear, Calcutta University, (By Title). 


The general problem of selecting from a given set of continuous populations a subset, 
which should contain the most desirable population, such as the population having the 
largest or smallest parameter of location or scale with certain specified risk, is studied 
in this paper. The problem of ranking either the location or scale parameters when ail 
other parameters are assumed to be known is at first considered. The closely similar prob- 
lem when both location and scale parameters are assumed to be unknown but when one 
of these two parameters, either location or scale, can be eliminated by the method of stu- 
dentization, is then discussed. It is also shown that the analogous problem of ranking 
parameters belonging to multivariate populations is readily solvable from the proposed 
solution to the above problems. When the same experiment is to be continued to more 
than one stage the modifications required for the solution to such allied multistage prob- 
lems are also indicated in broad outline. The decision procedure suggested for these prob- 
lems is shown to possess many desirable properties which include properties of unbiased- 
ness, gradation and monotonicity. The suggested decision rule also minimizes the expected 
size of the finally retained subset in most situations and, in fact, may be taken to be the 
optimum from an infinite class of decision rules. (Received July 29, 1957.) 


56. Statistical Estimate and Control of the Costs Caused by Accidents in a 


Factory, Hans BtuumMann, University of California, (Introduced by E. 
L. Scott). 


Consider the random variable Z equal total costs of all accidents in a time interval of 
fixed length divided by sum of all salaries paid in the same time. Z depends on two fac- 
tors, the number of accidents W and the amount of damage X caused by each accident. 
Approximating the frequency of W by a normal distribution and assuming the distribu- 
tion of damage to be of I'-type, the distribution of Z is obtained. Records of the number 
of accidents and the total costs in the past intervals provide maximum likelihood esti- 
mates of the parameters and the expected value of Z. In practice, we want to estimate 
the values of the parameters and to test the hypothesis that they are unchanged. The 
results obtained include sufficient statistics for each of the parameters with their dis- 
tributions. (Received August 2, 1957.) 
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NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


The following IMS members were awarded travel grants from the National 
Science Foundation to enable them to attend the Congress of the International 
Union for the Scientific Study of Population and the 30th Session of the Inter- 
national Statistical Institute to be held concurrently in Stockholm, Sweden, 
August 8 to 15, 1957: Joseph Berkson, Mayo Clinic; Albert H. Bowker, Stanford 
University; Jerome Cornfield, National Institutes of Health; Philip M. Hauser, 
University of Chicago; and Eugene Lukacs, Catholic University. 

Allan G. Anderson has resigned his position as Mathematician in the Glass 
Division Research Laboratory of the Pittsburgh Plate Glass Company and is 
now imployed as Chief Statistician at the General Tire and Rubber Company 
in Akron, Ohio. 

Fred C. Andrews has resigned his position at the University of Nebraska and 
is now an Associate Professor of Mathematics at the University of Oregon. 

Abdur Rahman Ansari is now a Research Assistant in the Department of 
Statistics at Virginia Polytechnic Institute. 

John H. Bailey received an M.S. in Prob. and Stat. at the University of Utah 
on June 10, 1957. He is continuing graduate studies toward a Ph.D. in the 
Department of Statistics at the University of North Carolina. 

Robert Bechhofer of the Department of Industrial and Engineering Adminis- 
tration and of the Statistics Center at Cornell University has recently been 
promoted from Associate Professor to Professor. 

Patrick Billingsley has left the U. 8S. Navy to accept a National Science 
Foundation Postdoctoral Fellowship at Princeton University. 

Bradley D. Bucher was awarded a Ph.D. degree in mathematical statistics at 
Princeton University in June, 1957. He has accepted employment at the In- 
stitute for Defense Analyses at the Pentagon in Washington, D. C. 

Elliot Cramer, who received his M.A. degree from The Johns Hopkins Uni- 
versity in June, has been commissioned in the U. 8. Public Health Service and 
is currently working in the Biometrics Branch of the National Institutes of 
Health. 

Paul Dorweiler, Actuary of the Aetna Casualty and Surety Company, retired 
in May after serving 38 years in the company. 

Dr. Alvin V. Fend, of New Mexico, has joined the staff of Technical Opera- 
tions, Incorporated. Originally from Chicago, Dr. Fend is a graduate of the 
University of Illinois, where he received his B.S. and M.S. in pure mathematics 
and his M.A. and Ph.D. in mathematical statistics. Most recently he has been a 
member of the academic faculty of the New Mexico College of Agriculture and 
Mechanical Arts and has engaged in statistical consulting. 

Dr. G. Ronald Herd resigned his position as Consultant with Aeronautical 
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tadio, Inc., in Washington, D. C., and joined the Washington staff of Applied 
Research, Inc., a subsidiary of Booz, Allen and Hamilton, and is engaged in 
Operations Research. 

Wayne H. Jones, formerly at the National Security Agency, Washington, 
D. C., is now working for the System Development Corporation, Santa Monica, 
California. 

Dr. Edward L. Kaplan is on leave of absence from the Mathematical Research 
Department, Bell Telephone Laboratories, Murray Hill, N. J., until January. 
Currently he has a part-time position as mathematician in the Theoretical 
Division, University of California Radiation Laboratory, Livermore, California. 

Robert M. Kozelka has accepted a position with Williams College in the 
Mathematics Dept. He spent the summer at Stanford University attending the 
Institute in Social Sciences for College Teachers of Mathematics sponsored by 
the Social Science Research Council. 

William G. Madow of the University of Illinois has accepted a position as 
senior research mathematical statistician at Stanford Research Institute and 
consulting professor or statistics at Stanford University. 

Jack Moshman has left the Bell Telephone Laboratories, Inc., where he was a 
consulting statistician, to assume the post of Director of the Division of Mathe- 
matical and Statistical Services of the Council for Economic and Industry 
Research, Inc. 

Gottfried E. Noether is on leave of absence from Boston University. He spent 
the summer at the Mathematical Center in Amsterdam. During 1957-58 he will 
serve as Fulbright lecturer in mathematical statistics at the University of 
Tuebingen, Germany. 

Jack I. Northam, formerly Assistant Professor of Mathematics, Kansas State 
College, is now with The Upjohn Company, Kalamazoo, Michigan, as a consult- 
ing statistician for chemical and biological applications. 

Carl R. Ohman has been appointed Assistant Professor in the Department of 
Mathematics at Knox College, Galesburg, Illinois, for the 1957-58 session. 

Roy Radner has taken a joint appointment in the Departments of Economics 
and Statistics at the University of California, Berkeley. 

Lila Knudsen Randolph has recently moved from Collingswood, New Jersey, 
to Bethesda, Maryland. She is employed as a statistician at the National Cancer 
Institute. 

Dr. I. Richard Savage has accepted an appointment as Associate Professor in 
the School of Business Administration and the School of Public Health at the 
University of Minnesota. 

Dr. D. E. W. Schumann, Head, Department of Statistics, University of 
Stellenbosch, Stellenbosch, South Africa, and Dr. R. A. Bradley, Professor of 
Statistics, Department of Statistics, Viriginia Polytechnic Institute, Blacksburg, 
Virginia, were co-authors of a paper entitled ‘““The Comparison of the Sensi- 
tivities of Similar Experiments’’, which won the 1957 J. Shelton Horseley Re- 
search Award of the Virginia Academy of Science at the May meeting at Old 
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Point Comfort, Virginia. The J. Shelton Horseley Research Award is awarded 
annually to the best paper submitted in the competition in Virginia. 

Franklin Sheehan of San Francisco has joined the staff of Technical Opera- 
tions, Incorporated in Monterey. Mr. Sheehan received his Bachelor of Science 
degree from Stanford University, and has done work towards his doctorate at 
Stanford University. Sheehan has been an instructor in Mathematics at Menlo 
College, and an instructor in Mathematics and Statistics at San Francisco State 
College. He will be an Operations Analyst for the Monterey research and de- 
velopment firm. 

Paul N. Somerville has accepted a position with A. M. Mood and the General 
Analysis Corporation at Fort Huachuca, Arizona. 

Dr. Roebert L. Stearman has resigned his position as Biometrician of the 
National Institute of Arthritis and Metabolic Diseases to accept a position with 
a biological operations research team of Applied Research, Inc. 

Ray B. Stiver, Jr., has recently accepted a position as a Statistician with 
Health Research, Inc., Roswell Park Memorial Institute Division, Buffalo, New 
York. He was formerly employed by the Bell Aircraft Corporation, Niagara 
Falls, New York, as a Reliability Engineer. In his new position he will be work- 
ing as a statistician on various cancer chemotherapy programs. 

Devid van Tijn has accepted a position with the Washington office of Applied 
Research, Inc. 

Lionel Weiss, formerly at the University of Oregon, has accepted an appoint- 
ment as Associate Professor in the Department of Industrial and Engineering 
Administration and in the Statistics Center at Cornell University. During the 
academic year 1957-58 he will be offering several courses in the Department. of 
Mathematics as a replacement for Jack Wolfowitz who will be on leave. 

William Wolman, formerly with the Quality Control Division of the Bureau 
of Ordnance, is now a consultant statistician with the Bureau of Yards and 
Docks, Department of the Navy, Washington, D. C. 


— EE 


New Members 


The following persons have been elected to membership in the Institute 
May 2, 1957, to August 7, 1957 


Ang, Hian Liang, M.Sc. (Univ. of Indonesia, Bandung), Graduate Student, University of 
California; 2523 Ridge Road, Apt. 315, Berkeley 9, Calif. 

Balakrishnan, A. V., Ph.D. (Univ. of California. Los Angeles), Assistant Professor, Uni- 
versity of California, Los Angeles; 7609 W. 91st Place, Los Angeles 45, Calif. 

Bartsch, Glenn E., Sc.D. (The Johns Hopkins Univ.), Research Associate, School of Hy- 
giene and Public Health, The Johns Hopkins University, 615 North Wolfe Street, Balti- 
more 5, Maryland. 

Blumstein, Alfred, M.A. (Univ. of Buffalo), Graduate Student, Dept. of Industrial and 
Engineering Administration, Cornell University, Ithaca, New York. 

Celis, Julis, Lic. en Cuncias Estadisticas (la Universidad Central de Venezuela), Chief, 
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Section of Demography and Territory, Director General of Statistics and Census, Cara- 
cas, Venezuela. 

Clunies-Ross, Charles W., Diploma in Mathematical Statistics (Trinity College, Cam- 
bridge Univ.), Associate Professor of Statistics, Virginia Polytechnic Inst., Blacksburg, 
Virginia. 

Diederichs, John K., B.A. (Univ. of Chicago), Manager, Engineering Economics Research 
Department, Armour Research Foundation, Chicago, Illinois; 329 Callan Avenue, 
Evanston, Illinois. 

Dor, Leopold, Docteur en Sciences Mathematiques (Univ. de Liege), Chef du Service d’Or- 
ganisation, S.A. COCKERILL-OUGREE, Seraing S/M, Belgium. 

Eid, M. M., Diploma of Inst. of Statistics (Cairo Univ.), Student, Institute of Statistics, 
University of North Carolina, Chapel Hill, N.C. 

Glaze, Walter, Jr., B.A. (Univ. of Denver), Captain, U. 8. Air Force, Headquarters, A. F. 
Institute of Technology, Wright-Patterson AFB, Ohio; 55 Callie Lane, Menlo Park, 
Calif. 

Glenn, William A., B.A. (Univ. of New Brunswick), Graduate Student, Virginia Poly- 
technic Institute, Bor 454, Blacksburg, Virginia. 

Goldman, Thomas A., M.A. (George Washington Univ.), Economist, The RAND Corp., 
1700 Main St., Santa Monica, Calif. 

Harrington, Edwin C., Jr., M.Chem. (Cornell Univ.), Scientist, Monsanto Chemical Co., 
Springfield, Mass.; 428 Mountain Road, Wilbraham, Mass. 

Heaney, Marian T., M.A. (Columbia Univ.), Student on Government Fellowship, National 
Security Agency, Washington 25, D. C.; 8658 Piney Branch Road, c/o Maki, Silver 
Spring, Maryland. 

Hexter, Alfred C., B.A. (Univ. of Calif., Berkeley), Student, University of California, De- 
partment of Statistics, Berkeley, California. 

Hopkins, Earl E., Ph.D. (Harvard Univ.), Associate Professor, University of Oregon Medical 
School, 3181 Sam Jackson Park, Portland 1, Oregon. 

Howard, William J., B.S. (Stanford Univ.), Statistical Analyst, North American Aviation, 
Inc., Rocketdyne Division, 6633 Canoga Avenue, Canoga Park, Calif.; 20462 Vose 
Street, Canoga Park, Calif. 

Jackson, Benjamin A., Ph.D. (New York Univ.), Research Biologist, American Cyanamid 
Co., Lederle Laboratories Div., Pearl River, New York. 

Kamat, A. R., Ph.D. (London Univ.), Head, Dept. of Mathematics and Statistics, Fergus- 
son College, Poona 4, India; 2, Fergusson College, Poona 4, India. 

Kimura, Kiyo, Diploma in Aeronautics (Nagoya Technical College), Assistant of Mathe- 
matics, c/o Department of Mathematics, Mie Prefectural University, Otanicho, Tsu, 
Mie-Ken, Japan. 

Lordan, Joseph D., B.S. (Mass. Inst. of Technology), Mathematician, Lincoln Laboratory, 
P.O. Box 73, Lexington 73, Mass. 

Monzon, J. N. Acosta, Licencia de Ciencios Estadisticos (Universidad Central de Vene- 
zuela), Compania Shell de Venezuela, Division de Contrololia, Refineria Cordon, Punto 
Fijo, Edv. Folcon, Venezuela. 

Mote, Vasant L., M.Sc. (Univ. of Bombay), Student, Bor 5457, N.C. State College, Raleigh, 
North Carolina, 

Moustafa, Madany Disouky, Ph.D. (Univ. of London), Lecturer, Institute of Statistics, 
Cairo University, Cairo, Egypt. 

Nylander, John E., M.A. (Univ. of Illinois), Graduate Fellow, Math. Dept., Wayne State 
University, Detroit, Mich.; 14870 Burgess, Detroit 23, Mich. 

Owen, Joel, M.A. (Boston Univ.), Engineer, Sylvania Electric Products, Inc., Analysis 
Dept., Waltham 54, Mass.; 38 Lawrence Ave., Roxbury 21, Mass. 

Peller, Sigismund, M.D. (Univ. of Vienna), Physician in private practice, 164 East 81st 
St., New York 28, N. Y. 
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Polaneczky, Aloysius J., B.S. in Chem. Eng. (Drexel Institute of Tech.), Senior Res. Engr., 
The Franklin Institute of the State of Pennsylvania, Philadelphia 3, Pennsylvania; 
307 Lyster Rd., Oreland, Pa. 

Radner, Roy, Ph.D. (Univ. of Chicago), Assistant Professor, Cowles Foundation for Re- 
search in Economics, Department of Economics, Yale University, New Haven, Con- 
necticut; Bor 2126, Yale Station, New Haven, Connecticut. 

Rothman, David, M.A. (Univ. of Wisconsin), Research Engineer, North American Aviation, 
Inc., Rocketdyne Div., 6683 Canoga Ave., Canoga Park, Calif. 

Shiue, Cherng-jiann, Ph.D. (Univ. of Minnesota), Assistant Professor, School of Forestry, 
Univ. of Minnesota, St. Paul 1, Minnesota. 

Stansbrey, John J., Ph.D. (Washington Univ., St. Louis, Mo.), Research Chemist, Funda- 
mental Research Dept., National Cash Register Co., Dayton 9, Ohio; 1808 Shroyer 
Road, Dayton 9, Ohio. 

Starks, T. H., M.S. (Purdue Univ.), Graduate Student, Statistics Dept., Virginia Poly- 
technic Inst., Blacksburg, Virginia. 

Taylor, John, M.A. (Cambridge Univ.), Statistician, Food Research Dept., Unilever, Ltd., 
Colworth House, Sharnbrook, Bedford, England. 

Varady, John Carl, Jr., B.S. (Calif. Inst. of Tech.), Teaching Assistant, University of 
Washington, Department of Mathematics, Seattle, Washington. 

Vartak, Monohar Narhar, M.Sc. (Univ. of Bombay), Temporary Lecturer in Statistics, 
Dept. of Statistics, Univ. of Bombay, University Bldgs., East Wing, Mahatma Gandhi 
Road, Bombay 1, India. 

Walker, James W., Ph.D. (Univ. of N. C.), Assistant Professor, School of Mathematics, 
Georgia Institute of Technology, Atlanta, Georgia. 

Wallech, Henry, B.A. (Univ. of Calif., Los Angeles), Research Analyst, North American 
Aviation, Inc., Rocketdyne Div., 6638 Canoga Ave., Canoga Park, Calif. 

Wylie, Jack E., B.A. (Univ. of California, Los Angeles), Head of Reliability and Consult- 
ant, Lear, Inc., General Analysis Corp., 11753 Wilshire Blud., W. Los Angeles, Calif. 


(a 
Survey of Research Potential and Training in the Mathematical Sciences 


The Committee on the Survey announces the completion of the Survey of 
Research Potential and Training in the Mathematical Sciences. The Survey 
began in January 1955, and nearly thirty mathematicians have served on its 
Committee and four Subcommittees during the past two and one-half years. 

The Survey has produced three documents. One of these is the Report on a 
Conference on Undergraduate Mathematics Curricula. The Conference was held 
at Hunter College on October 12-13, 1956 and was attended by 25 representa- 
tives of various colleges and universities. There were formal reports by various 
speakers covering many aspects of, and subjects related to, undergraduate edu- 
cation in mathematics, and these formal reports have been incorporated into the 
conference report. This document has been circulated widely. A small number of 
copies have been deposited with the National Science Foundation and are avail- 
able on request. 

The other two documents constitute the Final Report. Part I is a 163 page 
report on the organization of the Survey and the results of two large scale data 
gathering activities of the Committee. The first activity consisted of interviews 
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of the principal Ph.D. granting institutions. Fifty-nine departments of mathe- 
matics were visited, and three others completed the interview schedule without 
a visit. The data presented as obtained by these interviews are concerned with 
such matters as facilities, libraries, promotion and leave policies, administration, 
and salaries. The remainder of the report presents a set of tables giving the 
composite life story of 1851 mathematicians who received the Ph.D. from 1915 
to 1954, as derived from their replies to a very penetrating questionnaire. 

Part II of the Final Report consists of the reports of the four Subcommittees, 
and the 28 resulting recommendations of the Committee. We note that the 
Subcommittee on Undergraduate Colleges recommended the extended support of 
prize competitions in mathematics, urged that an attempt be made to call to 
the attention of industry the need for support of mathematics in undergraduate 
colleges, and requested that the National Science Foundation publish annually 
an up-to-date record of the mathematics faculties of all colleges and universities 
with the degrees held, and the date and place these degrees were conferred. 

The report of the Subcommittee on Research Environment discussed such 
matters as facilities, salaries, leaves and fellowships, and made a number of 
recommendations of these matters. 

The Subcommittee on Non-Teaching Opportunities recommended the explora- 
tion of further methods for establishing understanding among academic mathe- 
maticians of the nature of non-academic mathematics and made recommenda- 
tions on various other related matters. The report also called attention to the 
recent action of the American Mathematical Society in adding the SIAM Jour- 
nal to the group of journals receiving a subsidy from the American Mathematical 
Society, and suggested the possibility that the Journal may provide another 
desirable publication outlet for some of the research results of non-academic 
mathematicians. 

The final report, from the Subcommittee on Publications, recommended the 
establishment of a journal for expository articles, and the establishment of 
prizes for mathematical books, recommended the subsidization of mathematical 
publications by the National Science Foundation on a permanent basis, and 
recommended that the volume of translations published by the American Mathe- 
matical Society be at least doubled. 

The Final Report has been distributed to the chairmen of about 110 depart- 
ments of mathematics, to Committee and Subcommittee members, and to the 
officers of various mathematical organizations. A limited number of additional 
copies are available, and can be obtained by writing to the Program Director 
for Mathematics, National Science Foundation, 1520 H Street, Washington 
25, D. C. 

A. A. ALBERT 
Chairman, Committee on the Survey 
June 10, 1957. 





NEWS AND NOTICES 
Catholic University Program 


At The Catholic University of America a program of late afternoon and 
evening courses leading to the master and doctoral degrees in mathematics will 
be offered in the fields of analysis, algebra, theory of probability, mathematical 
statistics, and numerical analysis. The program of public lectures in the field of 
Mathematical Statistics which were given monthly during the past year under 
the sponsorship of the National Science Foundation will be resumed under the 
same sponsorship in late September. 


oa 
Preliminary Actuarial Examinations Prize Awards 


The winners of the prize awards offered by the Society of Actuaries to the 
nine undergraduates ranking highest on the score of Part 2 of the 1957 Pre- 
liminary Actuarial Examination are as follows: 

First Prize of $200: Robert Solovay, Harvard University. 

Additional Prizes of $100 each: 

Flittie, John H. Drake University 

Gardner, John R. University of Toronto 

Kandall, Geoffrey A. Princeton University 

Lakser, Harry University of Manitoba 
Lichtenbaum, Stephen Harvard University 

Posner, Paul Princeton University 

Sadowsky, George Harvard University 
Zvengrowski, Peter D. Rensselaer Polytechnic Institute 

The Society of Actuaries has authorized a similar set of nine prizes for the 
1958 examinations on Part 2. 


The Preliminary Actuarial Examinations consist of the following three exami- 
nations: 


Part 1. Language Aptitude Examination. (Reading comprehension, meaning 

of words and word relationships, antonyms, and verbal reasoning.) 

Part 2. General Mathematics Examination. (Algebra, trigonometry, coordi- 

nate geometry, differential and integral calculus.) 

Part 3. Special Mathematics Examination. (Finite differences, probability 

and statistics.) 

The 1958 Preliminary Actuarial Examinations will be prepared by the Edu- 
rational Testing Service under the direction of a committee of actuaries and 
mathematicians and will be administered by the Society of Actuaries at centers 
throughout the United States and Canada on May 14, 1958. The closing date 
for applications is April 1, 1958. 
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National Academy of Sciences-National Research Council Division of 
Mathematics 


Fellowship and Research Opportunities 


The Division of Mathematics wishes to call attention to the fact that several 
foundations and offices will offer financial support for research in mathematics 
during the year 1958-59. A number of fellowships will be made available, as 
well as opportunities for mathematicians to engage in basic research. A partial 
list, with comments, is given below. 


1. National Science Foundation. The National Science Foundation sponsors various fellow- 
ship programs in the sciences, including mathematics. 

Predoctoral fellowships are awarded annually at the First Year, Intermediate, and Ter 
minal Year levels of graduate study. Applications for 1958-1959 will be available in October 
1957 from the National Academy of Sciences—National Research Council until the closing 
date in early January 1950; Award date—March 15, 1958. 

Science Faculty fellowships for college science teachers (including mathematics) who plan 
to continue teaching and wish to increase their competence as teachers are at the present 
time offered semi-annually. Eligibility requirements include a baccalaureate degree and 
three (3) years of full-time experience in teaching natural science subjects at the collegiate 
level. The program will be opened to application in October 1957; and closed early in 
January 1958. Awards will be made on March 20, 1958. The program will also be reopened the 
following summer. Address all inquiries for information and applications to National 
Science Foundation, Division of Scientific Personnel and Education, Washington 25, D. C. 

Postdoctoral fellowships (in making inquiry about postdoctoral awards specify program). 

(1) Regular postdoctoral fellowships—primarily for recent recipients of the doctoral 
degree; awarded semi-annually. Program for 1958-1959 concurrent with predoctoral pro- 
gram (see above) except that program closes in December. Information and applications 
will be available from NAS-NRC. The program will also be open from July to early Septem 
ber 1958. Awards are announced in March and October. 

(2) Senior postdoctoral fellowships—are open to persons who have held a doctoral degree 
in one of the basic fields of science for a minimum of five (5) years at time of application, or 
who have had equivalent training and experience. Awarded semi-annually. Applications 
are available from the National Science Foundation, Division of Scientific Personnel and 
Education, Washington 25, D. C. The program will be opened from October 1957 until 
January 1958. Awards will be announced on March 18, 1958. The program will be reopened 
in the summer of 1958. 

Research Grants. The National Science Foundation also supports basic research in the 
mathematical sciences by means of grants. While proposals for such support are accepted 
at any time, individuals desiring support to begin in the summer or at the beginning of a 
fall semester should preferably submit their proposals in the mathematical sciences by 
November 1; persons desiring support to begin in the spring semester should preferably 
submit their proposals by May 1. Instructions for the preparation of proposals, contained in 
a booklet entitled Grants for Scientific Research, may be obtained upon request from the 
Program Director for Mathematical Sciences, National Science Foundation, 1520 H Street, 
N. W., Washington 25, D. C. 

2. Office of Naval Research. The Office of Naval Research, through contracts with universi 

ties and other organizations, supports basic research in broadly selected fields of mathe- 
matics. Proposals should be directed to the Mathematics Branch, Office of Naval Research, 
Washington 25, D.C. In addition, postdoctoral research associateships in pure mathematics 
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are being established under contracts with the ONR at selected universities. For details 
and application forms write to the above address. 

3. Air Force Office of Scientific Research. The Air Force Office of Scientific Research supports 
research in mathematics directly through contracts with colleges, universities, foundations 
and industrial laboratories. Such organizations are encouraged to submit proposals for 
research in mathematical fields in which they specialize. Proposals should be mailed to the 
Commander, Air Force Office of Scientific Research, Attn: Mathematics Division, Wash- 
ington 25, D. C. 

4. Office of Ordnance Research, U. S. Army. Among the functions of the Office of Ordnance 
Research is the support of basic research in mathematics. Proposals for projects are ordinar- 
ily made by individual scientists or groups of scientists in a form which leads to a contract 
between the Office of Ordnance Research and a university or research laboratory. For 
further information write to Commanding Officer, Office of Ordnance Research, Box CM, 
Duke Station, Durham, North Carolina. 

5. Fulbright Awards—Public Law 584 (79th Congress). Approximately 400 awards are offered 
annually for university lecturing and postdoctoral research in all academic fields in Aus- 
tralia, Burma, Chile, Colombia, Ecuador, India, New Zealand, Pakistan, Paraguay, Peru, 
the Philippines, Thailand (competition for the preceding countries closes April 15, 1958); 
Austria, Belgium-Luxembourg, Denmark, Finland, France, Germany, Greece, Iceland, 
Ireland, Israel, Italy, Japan, the Netherlands, Norway, Turkey, and the United Kingdom 
including colonial dependencies (competition for the latter countries closes October 1, 1958). 
In both cases awards are for the academic year 1959-60 (the 1958-59 competition for Europe 
closes October 1, 1957), but in the former group of countries the academic year begins in the 
spring or summer instead of the autumn. Awards are payable in foreign currency and usually 
include travel for the grantee, but not for members of his family, and a maintenance allow- 
ance, which may be adjusted in relation to the number of accompanying dependents up to 
four. Requests for information should be addressed to the Committee on International 
Exchange of Persons, Conference Board of Associated Research Councils, 2101 Constitu- 
tion Avenue, Washington 25, D. C. 

6. National Bureau of Standards. Naval Research Laboratory. Oak Ridge National Labora- 
tory. Postdoctoral resident research associateships are available in a variety of sciences 
including mathematics and are tenable at the Washington, D. C. and Boulder, Colorado 
laboratories of the National Bureau of Standards; at the Naval Research Laboratory in 
Washington, D. C.; and at the Oak Ridge National Laboratory in Oak Ridge, Tennessee. 
Necessary facilities and equipment incident to the research of the associate will be provided. 
For further information write to Fellowship Office, National Academy of Sciences—National 
Research Council, 2101 Constitution Avenue, Washington 25, D. C. Applications for the 
1958-59 program must be filed on or before January 13, 1958. 

7. Atomic Energy Commission. The Division of Research of the Atomic Energy Commission 
through contracts with universities and other organizations supports research in the fields 
of numerical analysis, digital computer design, programming research, and related topics. 
Proposals should be submitted to the Division of Research, Atomic Energy Commission, 
1901 Constitution Avenue, Washington 25, D. C. 

Brookhaven National Laboratory. Brookhaven National Laboratory, operated by As- 
sociated Universities, Inc. under contract with the Atomic Energy Commission offers post- 
doctoral research appointments in mathematics. Appointments are for one year, and may be 
renewed for one additional year. U.S. citizenship is not required, although Atomic Energy 
Commission approval is a prerequisite. The appointee may work in numerical analysis, 
digital computing, mathematical physics, differential equations, probability and statistics, 
and various specialized branches including reactor theory, hydrodynamics, and orbit 
theory. Computational facilities are available. Letters from candidates should give details 
of personal history, scientific background, and qualifications; two letters of reeommenda- 
tion, one from the applicant’s research professor, are required. Applications for the aca- 
demic year 1958-59 must be received by August 15, 1958 and should be directed to M. E. 
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Rose, Head, Applied Mathematics Division, Brookhaven National Laboratory, Upton, 
Long Island, New York. 


Paut A. Smiru, Chairman 
Division of Mathematics 
Harotp W. Kuun, Executive Secretary 
Division of Mathematics 
September 1, 1957 


——— a 


NRC-NBS Research Associateships 


Research associateships, supported by the National Bureau of Standards and 
awarded on recommendations of the National Academy of Sciences—National 
Research Council, are offered to provide young investigators of unusual promise 
and ability the opportunity for basic research in various branches of the physical 
and mathematical sciences. These associateships are open only to citizens of the 
United States and are tenable at the National Bureau of Standards in Washing- 
ton, D. C. Applicants must have the Ph.D. or Sc.D. degree, or their equivalent. 
The term of the appointment is for one calendar year. It is expected that approxi- 
mately 10 awards may be made in a total of fourteen fields, of which the following 
are of particular interest to mathematical statisticians: Applied Mathematical 
Statistics and Numerical Analysis. Awards will be made about April 1, 1958. 
Appointments will be for one year. The annual gross stipend will be $7035 and 
will be subject to income tax. Requests for application forms and for additional 
information about requirements for applications should be addressed to the 
Fellowship Office, National Academy of Sciences—National Research Council, 
2101 Constitution Avenue, N. W., Washington 25, D. C. Applications for the 
academic year 1958-1959 must be received in the Fellowship Office no later than 
January 13, 1958. 


en 


Postdoctoral Study in Statistics 

Awards for study in statistics by persons whose primary field is not statistics 
but one of the physical, biological, or social sciences to which statistics can be 
applied are offered by the Department of Statistics of the University of Chicago. 
The awards range from $3,600 to $5,000 on the basis of an eleven month residence. 
The closing date for application for the academic year 1958-9 is February 15, 
1958. Further information may be obtained from the Department of Statistics, 
Eckhart Hall, University of Chicago, Chicago 37, Illinois. 


—_—_—_—_—_——E 


REPORT OF THE ATLANTIC CITY, N. J. MEETING OF THE INSTITUTE 


The seventy-fourth meeting of the Institute of Mathematical Statistics and 
the twentieth annual meeting was held in Atlantic City, New Jersey, on Sep- 
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tember 10-13, 1957 in conjunction with the meetings of the American Statistical 
Association, the Biometric Society (ENAR) and the Econometric Society. The 
list of persons attending the meeting was not available when this report went to 
the printer. 

The program was as follows: 


Tuesday, September 10, 1957 
10:00—-11:30 a.m. Session of Mixed Topics 


Chairman: Meyer Dwass, Northwestern University 
Papers: 1. Some Remarks on Statistical Inference, Hersert Ropsins, Columbia 
University 
2. How to Gamble If You Must, Lester E. Dusins, Carnegie Institute of 
Technology, and Leonarp J. SavaGe, University of Chicago 


11:30-12:30 p.m. Wald Lecture I 


Chairman: T. E. Harris, The RAND Corporation 
Paper: 1. Pélya-Type Theory, Samuet Karun, Stanford University 


2:00-3:30 p.m. Rietz Lecture 


Chairman: Jerzy Neyman, University of California, Berkeley 
Paper: 1. The Coding of Messages Subject to Chance Errors, J. Wou.row1tz, Cornell 
University 


3:30-6:00 p.m. Multiple Decision Selection Procedures 


Chairman: William G. Cochran, The Johns Hopkins University 
Papers: 1. A Sequential Multiple Decision Procedure for Selecting the Best One of Several 
Normal Populations with a Common Unknown Variance, and Its Use with 
Various Experimental Designs, Rospert E. Becnuorer, Cornell Uni- 
versity 
2. Multiple Decision Procedures for Selecting that Multinomial Event Which Has 
the Largest Probability, SALAH ELMAGHRABY and NorMAN Morse, Cornell 
University 
3. A Parametric Approach to the Problem of Selecting a Subset Containing the 
Best Population, MiLtoN SoBe, Bell Telephone Laboratories 


4:00-6:00 p.m. Use of Stochastic Processes in Biology (With the Biometrics 
Section (ASA) and the Biometrics Society (ENAR)) 


Chairman: Paul Meier, The Johns Hopkins University 
Papers: 1. The Role of Variable Generation Time in Tissue Cell Multiplication, JosEru 
G. Horrman, Rosewell Park Memorial Institute 
2. Application of a Stochastic Model of Contagion to Data from a Foundling 
Home, Eve.yn Fix, University of California, Berkeley 
3. On the Probability of Survival of Bacteria in Sea Water, EvuGeENE K. Harris, 
Public Health Service 


8:00 p.m. 1957 Council Meeting 


President: A.M. Mood, General Analysis Corporation 
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Wednesday, September 11, 1957 


8:30-10:30 a.m. Application of Statistical Theory to Computers (With the 
American Statistical Association in General Session) 


Chairman: John W. Mauchly, Sperry Rand Corp. 

Papers 1. Some Probability Considerations in the Estimation of Computer Running 
Time, Ex1 8. Marks and Benjamin J. TeEppinG, National Analysts, Inc. 

2. A Model for UNIV AC Operations, Leon GiLFrorp, Bureau of the Census 

3. Statistical Strategies in Data Handling, J.T. Cuu, A. W. Hout, and WiLi1aM 
TURANSKI, Sperry Rand Corp. 

4. A Decision Model for Computer Utilization, by Morris J. Sotomon, Ameri 
can Greetings Corp 


9:00-11:30 a.m. Stochastic Processes 


Chairman: H. B. Mann, Ohio State University 
Papers: 1. The Identifiability Problem for Functions of Finite Markov Chains, Davin 
BLACKWELL, University of California, Berkeley 
2. Cosmic Ray Cascade Processes, T. E. Harris, The RAND Corporation 
3. Some Problems in Applied Stochastic Processes from the Point of View of 
Semi-group Theory, A. T. Baarucna-Rerp, University of Oregon 
4. On Testing the Markov Property, P. BinuinGs.Eey, Princeton University 


11:30-12:30 p.m. Wald Lecture II 


Chairman: William G. Madow, Stanford Research Institute 
Paper: 1. Pélya-type Theory, SamuEL Karin, Stanford University 


2:00-3:30 p.m. Special Invited Paper 


Chairman: Leo Katz, Michigan State University 
Paper: 1. Asymptotic Approximations to Distributions, Davip L. WALLACE, University 
of Chicago 


2:00-4:00 p.m. Some Classification Problems in Sociology and Psychology 
(With the American Statistical Association) 


Chairman: Herbert Solomon, Columbia University 
Papers: 1. Classification by Configuration and Level, E. 1. Burpocx, New York Depart- 
ment of Mental Hygiene 
2. Application of Analysis of Variance Techniques to Profile Analysis, 8. W. 
GREENHOUSE and 8. Geisser, National Institute of Mental Health 
3. Classification by Response Vectors Based on n Dichotomous Items, RosEDITH 
SrTGREAVES, Columbia University 


3:30-5:30 p.m. Regression Models and Multifactor Designs 


Chairman: Churchill Eisenhart, National Bureau of Standards 
Papers: 1. Some Mathematical Problems Arising in the Construction of Fractionally 
Replicated Design and Designs for the Study of Response Surfaces, R. C. 
Boss, University of North Carolina 
2. Reduced Regression Models, M. B. W1ik, Bell Telephone Laboratories 
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5:30 p.m. Business Meeting 


President: A.M. Mood, General Analysis Corporation 


8:00 p.m. Karl Pearson—An Appreciation on the 100th Anniversary of His 
Birth (With the American Statistical Association in General Session) 


Chairman: Burton H. Camp, Wesleyan University 

Speakers: Heten M. Wacker, Columbia University 
SAMUEL StrourFrer, Harvard University 
CHURCHILL EIsENHART, National Bureau of Standards 
JeRzY NEYMAN, University of California, Berkeley 


9:00 p.m. 1958 Council Meeting 


President: Leonard J. Savage, University of Chicago 


THURSDAY, SEPTEMBER 12, 1957 


8:30-10:30 a.m. Distinctive Characteristics of Psychometric Statistics (With 
the Biometric Section (ASA) and Biometric Society (ENAR). ) 


Chairman: George E. Nicholson, Jr., University of North Carolina 
Papers: 1. Problems of Metric, HARoLD GULLIKSEN, Educational Testing Service and 
Princeton University 
2. Problems Arising from Errors of Measurement, Frepertc M. Lorp, Educa 
tional Testing Service 
Discussants: HERBERT SoLomon, Columbia University 
Joun W. Tukey, Princeton University 


9:00-11:30 a.m. Contributed Papers I 


Chairman: Robert Hooke, Westinghouse Research Laboratories 
Papers: 1. On an Optimal Property of Variance-components Estimates, WERNER GAU1 
scHi, Indiana University 

2. Quantization for Least Mean Squares Error, Sruarr P. Luoyp, Bell Tele 
phone Laboratories, Inc. 

3. On the Equality of the Variances of Several Univariate Normal Populations 
and Some Multivariate Extensions, R. GNANADESIKAN, University of 
North Carolina 

. Further Contributions to Confidence Bounds on Multivariate Variance Compo 
nents, S. N. Roy and R. GNANADESIKAN, University of North Carolina 

. Effects and the Classical Analysis of Variance Mixed Model, Mary D. Lum, 
Wright Air Development Center (By Title) 

}. On the Distribution of the Latent Roots of Real Symmetric Random Matrices 
with Multinormally Distributed Elements, H. RoperT VAN DER VAART, 
University of North Carolina and Leiden University, The Netherlands 

. Mathematical Developments in the Theory of Human Lethal Dose (Preliminary 
Report), Cuirrorp J. Matoney, Fort Detrick 

3. An Extension of Box’s Results on the Use of the F Distribution in Multi 
variate Analysis, SeEyMouR GEISSER and SAMUEL W. GREENHOUSE, Na- 
tional Institute of Mental Health 

. On the Decomposition of Certain Chi-Square Variables, Ropert V. Hoge and 
ALLEN T. CraiG, University of Iowa (By Title) 
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. On the Non-optimality of Symmetrical Designs Among Randomized Designs, 
J.C. Kierer, Cornell University (By Title) 

. On the (Nonrandomized) Optimality of Symmetrical Designs, J. C. Kierer, 
Cornell University (By Title) 

2. A Generalization of the Discriminant Function Analysis (Preliminary Re- 
port), M. M. Rao, University of Minnesota (By Title) 

3. Asymptotic Independence of Tests of Parametric Forms of Cell Probabilities 
in the Analysis of Categorical Data, Earnt L. Diamonp, University of 
North Carolina (By Title) 

. Tests of Parametric Forms of Cell Probabilities and Their Asymptotic Power 
in the Analysis of Categorical Data, Earu L. Diamonp, University of 
North Carolina (By Title) 

. Bias in Certain Current Procedures of Response Surface Estimation, H. 
ROBERT VAN DER Vaart, University of North Carolina and Leiden 
University, The Netherlands (By Title) 

16. Statistical Estimate and Control of the Costs Caused by Accident in a Factory, 
Hans BtuuMann, University of California, Berkeley 


9:00-11:30 a.m. Contributed Papers II 


Chairman: Julius Lieblein, Navy Department 
Papers: 1. Conditions that a Stochastic Process Be Ergodic, EMANUAL PARZEN, Stanford 
University 

2. Some Renewal Processes Related to Types I and II Counter Models, RoNaup 
Pyke, Stanford University 

3. Contributions to the Theory of Random Mappings, BERNARD Harris, Stan- 
ford University 

. Exact Probabilities and Asymptotic Relationships for Some Statistics from 
m-th Order Markov Chains, Leo A. GoopMan, University of Chicago 

. The Telephone Trunking Problem (Preliminary Report), Herspert Scarr, 
The RAND Corporation 

5. Birth and Death Random Walk Process in s Dimensions, J. NEYMAN and 
EvizaBetu L. Scott, University of California 

. Random Walks in the Plane with General Absorbing Barriers, M. V. Jouns, 
Jr., Stanford University 

3. Distinguishability of Sets of Distributions (The Case of Independent and 
Identically Distributed Chance Variables), W. Horrrp1nG, University of 
North Carolina and J. WoLrow1tz, Cornell University 

. On Aggregation and Consolidation in Finite Substochastic Systems, I, Davip 
RosENBLATT, American University (By Title) 

. On Aggregation and Consolidation in Finite Substochastic Systems, IIT, Davin 
RosENBLATT, American University (By Title) 

. On Aggregation and Consolidation in Finite Substochastic Systems, III, 
Davip RosEnBiatTr, American University (By Title) 

2. On Aggregation and Consolidation in Finite Substochastic Systems, IV, 
Davip RosenBxiatt, American University (By Title) 

3. Age-dependent Branching Stochastic Processes in Cascade Theory, II. Case of 
Transformation Probabilities as Functions of Absorber Depth, W. Max 
Woops, Stanford University, and A. T. Baarucna-Rerp, University of 
Oregon (By Title) 

. Asymptotic Distributions of Some Goodness of Fit Criteria for m-th Order 
Markov Chains, Leo A. GoopMan, University of Chicago (By Title) 
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15. Runs Tests and Likelihood Ratio Tests for Markov Chains, Lno A. GooDMAN, 
University of Chicago (By Title) 


11:30-12:30 p.m. Wald Lecture III 


Chairman: David Blackwell, University of California, Berkeley 
Paper: 1. Pélya-Type Theory, SamvuEe. Karin, Stanford University 


2:00-3:30 p.m. Special Invited Paper 


Chairman: Harold Hotelling, University of North Carolina 
Paper: 1. Estimation of Economic Relations, CLirrorp Hitpretu, Michigan State 
University 


2:00-4:00 p.m. Applications of Computers to Statistical Problems (with the 
American Statistical Association in General Session). 


Chairman: Joseph F. Daly, Bureau of the Census 
Papers: 1. Some Experience with Electronic Computers in Processing Economic Census 
Data, MaxweE.iut R. Conkuin and Owen C. Gretron, Bureau of the 
Census 
2. Getting Mass Statistical Data into Computers—A Progress Report on FOSDIC, 
M. L. GrRENNovuGH, National Bureau of Standards 
3. On the Use of Electronic Computers for Standard Statistical Calculations, 
Mi.ron E. Terry, Bell Telephone Laboratories, Inc. 


4:00-6:00 p.m. Effect of High Speed Computing cu Statistics (With the Ameri- 
can Statistical Association) 


Chairman: R. L. Anderson, North Carolina State College 
Papers: 1. Some Examples of Computer Applications in Mathematical Statistics, GEORGE 
W. Brown, University of California at Los Angeles and General Analysis 
Corporation, and A. M. Moon, General Analysis Corporation 
2. Making the Computer Cry, ‘“‘Enough’’!, Jack MosuMan, Council for Eco 
nomic and Industry Research 
3. Machine Generation of Normal Deviates and Their Application to Power 
Functions of Statistical Tests, Mervin E. Muuuer, Princeton University 
and International Business Machines Corporation 
Discussant: H. O. Hartiey, Iowa State College 


10:00 p.m. Informal Party 


FRIDAY, SEPTEMBER 13, 1957 
9:00-11:30 a.m. Contributed Papers III 


Chairman: R. A. Bradley, Virginia Polytechnic Institute 
Papers: . Graphic Methods Based Upon Properties of Advancing Centroids, 8. I. Asko 
vitz, University of Pennsylvania and Albert Einstein Medical Center 
2. Nonparametric Mean and Variance Estimation from Truncated Data, Joun 
E. Wausu, Lockheed Aircraft Corporation (By Title) 
A Table of the Expected Value of the Quasi-Range, H. Leon Harter, Wright 
Air Development Center 
. On the Bivariate Sign Test, IsapoRE BLUMEN, Cornell! University (By Title) 
5. Tests of Functional Forms of Cell Probabilities and Their Asymptotic Power 
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in the Analysis of Categorical Data, Karu L. Diamonp, University of 
North Carolina 

Asymptotic Normality and Efficiency of Certain Nonparametric Test Statis- 
tics, HERMAN CHERNOFF, Stanford University, and I. Ricuarp Savaae, 
Stanford University and The Center for Advanced Study in the Behavioral 
Sciences 

. Extension of the Mann-Whitney “‘U’’ Test to Samples Censored at the Same 
Fired Point, Max HaupEerin, National Institutes of Health 

. Generalisation of Steinhaus’ Resulis on Fair Division, PETER NEWMAN, 
University College of the West Indies (By Title) 

. Distribution of a Serial Correlation Coefficient Near the Ends of the Range, 
M. M. Sippiqu1, University of North Carolina (By Title) 

. An Extension of the Theory of Cumulative Frequency Functions, Dr. BERNARD 
J. Derwort, North American Aircraft Corp., and Dr. WaLpo A. VezEau, 
St. Louis University (By Title) 

. On Some Distribution-free Bias Properties of the Latent Roots of Real Sym- 
metric Random Matrices, H. ROBERT VAN DER VAART, University of North 
Carolina and Leiden University, The Netherlands (By Title) 

On Convergence of Distribution Functions and of Moments of Order Statistics, 
M. Ronatet, University of California, Berkeley (By Title) 
3. The Significance Probability of the Smirnov Two-sample Test, J. L. Hopces, 
Jr., University of California, Berkeley (By Title) 

. On Ranking Parameters of Location and Scale is Continuous Populations, 
K. C. Seax, Calcutta University (By Title) 

5. Estimation of Regression Coefficients in Certain Classes of Continuous Param 
eter, Stationary Time Series (Preliminary Report), G. E. ALsBerr, 
University of Tennessee (By Title) 


9:00-11:30 a.m. Contributed Papers IV 


Chairman: John Gurland, Iowa State College 
Papers: 1. Tests for Significance in Bivariate Harmonic Analysis, HaroLp HOTELLING, 
University of North Carolina, and DonaLp F. Morrison, National 
Institute of Mental Health 
Testing Homogeneity of Means in the Presence of Heterogeneity of Variance, 
Joun GURLAND, Iowa State College, and LLoyp RosENBERG 
3. The Limiting Distribution of a Likelihood Ratio Test for the Serial Correlation 
Coefficient, Joun 8. Waite, Minneapolis-Honeywell Regulator Company 
(By Title) 
. Tests of Multiple Independence and the Associated Confidence Bounds, 8. N. 
Roy and R. E. Bargmann, University of North Carolina 
. Confidence Bounds on the ‘‘Ratio of Means’’ and ‘‘Ratio of Variances’’ for 
Correlated Variates, 8S. N. Roy and R. F. Porrsorr, University of North 
Carolina 
». Bayes Acceptance Sampling Procedures for Large Lots, DonaLD GuTHRIE, 
Jr., and M. V. Jouns, Jr., Stanford University 
. Best Unbiased Tests of Composite Hypotheses with s Constraints, M. Ronater, 
University of California, Berkeley 
3. On Tests of a Certain Hypothesis Based Upon Selected Sample Quantiles, 
Junsrro OGawa, University of North Carolina 
. Jacobi Polynomials and Distribution of Some Serial Correlation Coefficients, 
M. M. Srpprqut, University of North Carolina (By Title) 
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10. Tabulation of the Trivariate Normal Integral (Preliminary Report), GEORGE 
P. Steck, Sandia Corporation (By Title) 

11. Equally Spaced Levels for Multi-level Continuous Sampling Plans (Prelimi- 
nary Report), DonaLp GurTurtigz, Jr., and M. V. Jouns, Jr., Stanford 
University (By Title) 

12. On the Numerical Computation of Certain Multivariate Normal Integrals, H. 
ROBERT VAN DER VAART, University of North Carolina and Leiden Uni- 
versity, The Netherlands (By Title) 

13. On the Asymptotic Distribution of the Likelihood-ratio in Some Mixed Variates 
Populations, JuNsrro OGawa, University of North Carolina (By Title) 

14. Further Results in Testing of Hypotheses on a Multivariate Population, Some 
of the Variates Being Continuous and the Rest Categorical, M. D. Movus- 
TAFA, University of North Carolina (By Title) 

15. A Matrix Definition of the Correlation Between Two Sets of Variables, ANDRE 
G. Laurent, Michigan State University and Wayne State University 
(By Title) 


10:30-12:30 p.m. Demand and Supply of Statisticians Now and in the Future 
(With the Section on Training (ASA). ) 


Chairman: Virgil Anderson, Purdue University 
Papers: 1. From the Standpoint of Government, Daviv L. Furransky, Bureau of the 
Census 
2. From the Standpoint of Colleges, T. A. BANcrort, Iowa State College 
3. From the Standpoint of Industry, CuTHBERT DANIBL, Private Consultant 
4. From the Survey of Both Industry and Colleges, Boyp HARSHBARGER, Vir- 
ginia Polytechnic Institute 


11:30 a.m.-12:30 p.m. Wald Lecture IV 


Chairman: Herman Chernoff, Stanford University 
Paper: 1. Pélya-Type Theory, SAMUEL Karun, Stanford University 


2:00-3:30 p.m. Special Invited Paper 
Chairman: Wassily Hoeffding, University of North Carolina 


Paper: 1. Distribution-free Methods in Statistics, E. J. G. Pirman, University of Tas- 
mania and Stanford University 


3:30-5:30 p.m. Statistical Spectral Analysis of Stationary Time Series 


Chairman: Julius R. Blum, Indiana University 


Papers: 1. On Asymptotically Efficient Consistent Estimates of the Spectral Density 
Function of a Stationary Time Series, EMANUEL PARzEN, Stanford Uni- 
versity 


2. The Prediction Problem and the Structure of a Class of Stationary Processes, 


Murray Rosensiatt, Indiana University 
3. On the Joint Estimation of the Spectra, Cospectrum and Quadrature Spectrum 
of a Two-Dimensional Stationary Gaussian Process, N. R. GoopMan, 
American Cyanamid Company and New York University 
Discussant: Joun W. Tuxey, Princeton University and Bell Telephone Laboratories 
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REPORT OF THE PRESIDENT FOR 1957 


The Institute continued to grow and to prosper and to increase its influence in 
statistical affairs during the past year. Our journal has published more and a 
greater variety of papers than ever before; our meetings have attracted more 
participants than ever before. Our membership has increased as has the number 
of institutional members. The programs of our meetings are more extensive and 
have a greater density of invited lectures. 

Besides this fine annual meeting we have had a very successful regional meet- 
ing in Washington, D. C. attended by 145 members. The program for the Wash- 
ington meeting was arranged by a committee under the chairmanship of M. Zelen. 
The present Eastern Regional Program Committee chaired by Herbert A. Meyer 
is organizing a meeting in Tennessee for next spring. 

The excellent program for this annual meeting was arranged by a committee 
named below chaired by W. H. Clatworthy. Large contributions were made to it 
by the Special Invited Papers Committee and the Program Coordinator, Martin 
Wilk. 

The Western Regional Program Committee chaired by David Stoller has 
arranged a program for a Los Angeles meeting in December. The Central Re- 
gional Program Committee chaired by Jack Silber is planning a meeting-for next 
spring or summer possibly at Ames, Iowa. 

The Institute receives financial support from the National Science Foundation 
to sponsor a Summer Statistical Institute. Last summer we received $11,600 to 
cover expenses of a six weeks seminar on Analysis of Variance. It was held at 
the University of Colorado under the chairmanship of Oscar Kempthorne. Some 
twenty members of the Institute participated. A committee headed by Herman 
Chernoff selected Nonparametric Statistical Inference as the subject of the 1958 
Summer Statistical Institute. A committee of statisticians especially interested in 
nonparametric methods headed by I. R. Savage has prepared a proposal for the 
NSF and is organizing that seminar. 

The Special Invited Paper Committee under the chairmanship of Gerald 
Lieberman had an unusually big job this year in that it had the responsibility 
for both the Rietz and the Wald lectures in addition to the hour speakers. We 
are hearing at this Annual Meeting the excellent series of Wald lectures by Sam- 
uel Karlin; of the three special invited papers we are hearing, the two by D. L. 
Wallace and Clifford Hildreth were arranged by last year’s committee and the 
one by E. J. G. Pittman by this year’s committee, which has also arranged 
several papers for future meetings. 

Walter Bartky and L. J. Savage as delegates of the IMS attended a meeting 
of representatives of several mathematical societies to explore the possibility of 
forming a super organization which would perform certain functions in the area 
common to all of them, thus putting the Policy Committee for Mathematics on a 
more formal basis. The organization will doubtless be created in due course and 
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will probably include some ten or so societies primarily concerned with pure and 
applied mathematics. 

The affairs of the Institute are carried out by a large number of members who 
unselfishly devote a part of their time to the many tasks necessary to operating 
a society of this kind. Our members are extremely cooperative and responsive to 
their professional duties in this connection. During the year I have had occasion 
to ask over one hundred people to do something for the Institute; of those only 
three declined. That is a remarkably high response and I am most grateful to all 
for making my job so easy. 

The names of all these people who ran the Institute last year are listed below 
in their various capacities, but let me mention especially three people who are 
primarily responsible for seeing that all goes well and who devote a substantial 
proportion of their time to the Institute. They are G. E. Nicholson, Secretary, 
A. H. Bowker, Treasurer and T. E. Harris, Editor. They stay with their jobs 
for several years in succession thus providing continuity and stable management 
for the whole organization. 

The last of the many Committees which it is my duty to appoint is the Nomi- 
nating Committee for next year; it consists of David Blackwell (Chairman), 
R. L. Anderson, Leo Goodman, H. O. Hartley, Lincoln Moses, and Herbert 
Solomon. 

In closing let me congratulate the new Fellows of the Institute who are: 


Oskar Anderson J. M. Hammersley 
Douglas Chapman G. B. Kallianpur 
Will Dixon Eugene Lukacs 

I. J. Good G. R. Rao 


ALEXANDER M. Moop 
President 


IMS OFFICERS, COMMITTEES, AND REPRESENTATIVES FOR 1957 


COUNCIL: 

Terms Expire 1957 Terms Expire 1958 Terms Expire 1959 
R. L. Anderson R. C. Bose T. W. Anderson 
Leo Goodman Churchill Eisenhart M.S. Bartlett 
P. G. Hoel Oscar Kempthorne J. Berkson 
L. J. Savage W. J. Youden Ek. L. Lehmann 
Herbert Solomon 

President A. M. Mood 

President-Elect L. J. Savage 

Secretary G. E. Nicholson 

Treasurer A. H. Bowker 

Editor T. E. Harris 


PROGRAM COORDINATOR: M. B. Wilk 

ASSISTANT AND ASSOCIATE SECRETARIES: Jerome Cornfield, Evelyn Fix, Doro- 
thy Gilford, William Kruskal. 

ASSOCIATE TREASURER: E. 8S. Pearson. 

ASSOCIATE EDITORS: Z. W. Birnbaum, David Blackwell, Herman Chernoff, W. J. 
Dixon, J. M. Hammersley, J. Wolfowitz. 
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COMMITTEE ON FELLOWS: 


Terms Expire 1957 Terms Erpire 1958 Terms Expire 1959 

L. J. Savage F. J. Anscombe Z. W. Birnbaum 
Chairman 

M. 8S. Bartlett Leo Goodman Elizabeth Scott 


APPOINTED COMMITTEES 


PROGRAM COMMITTEE FOR ANNUAL MEETING: W. H. Clatworthy (Chairman), 
R. R. Bahadur, Grace Bates, J. R. Blum, R. KE. Greenwood, F. E. Grubbs, J. C. Kiefer, 
Paul Meier, Herbert Robbins, Elizabeth Scott. 

PROGRAM COMMITTEE FOR EASTERN REGION: H. A. Meyer (Chairman), Ralph 
Bradley, R. E. Beckhofer, Cuthbert Daniel, Sylvain Ehrenfeld, B. G. Greenberg, D. G. 
Horvitz, E. L. Kaplan. 

PROGRAM COMMITTEE FOR CENTRAL REGION: Jack Silber (Chairman), Virgil 
Anderson, F. C. Andrews, Charles Bell, H. T. David, F. A. Graybill, E. R. Immel, Ber- 
nard Ostle. 

PROGRAM COMMITTEE FOR WESTERN REGION: David Stoller (Chairman), John 
Hofmann, Richard Link, Frank Massey, Emanuel Parzen, Marion Sandomire, Lionel 
Weiss, Robert Wijsman. 

COMMITTEE ON LECTURES AND INVITED PAPERS: Gerald Lieberman (Chairman), 
W.S8. Connor, Lucien LeCam, G. E. Noether, D. L. Wallace, T. E vo (ex officio), 
M. B. Wilk (ex officio). 

COMMITTEE ON INDIVIDUAL MEMBERSHIP: Meyer Dwass (Chairman), David 
Duncan, Donald Fraser, M. N. Ghosh, Shanti Gupta, David Kendall, James Pachares, 
Henry Teicher, Louis Wegner. 

COMMITTEE ON ACADEMIC INSTITUTIONAL MEMBERS: R. G. D. Steel (Chair- 
man), Donald Darling, Rudolph Freund, Palmer Johnson, A. L. Comrey, Howard Tucker. 

COMMITTEE ON NON-ACADEMIC INSTITUTIONAL MEMBERS: Stanley Rothman 
(Chairman), 8. L.' Anderson, Stanley Isaacson, Daniel Teichroew, Benjamin Tepping, 
John E. Walsh. P 

COMMITTEE ON SUBSCRIPTIONS TO THE ANNALS: Lila Elveback (Chairman). 
Joe Adams, K. A. Bush, Edward Coleman, Harry Harman, T. A. Jeeves. 

NOMINATING COMMITTEE (APPOINTED BY DAVID BLACKWELL, PAST- 
PRESIDENT): Charles Stein (Chairman), Joseph Berkson, Kai Lai Chung, John Curtiss, 
David Kendall. 

FINANCE COMMITTEE: Melvin Peisakoff (Chairman), A. H. Bowker, Cuthbert Hurd, 
Theodore Yntema. 

COMMITTEE ON PRINTING THE ANNALS: Kenneth Arnold, Chairman, A. H. Bow 
ker, J. H. Curtiss, T. E. Harris, William Kruskal, Sigeiti Moriguti, D. van Dantzig. 
COMMITTEE TO ORGANIZE 1957 SUMMER STATISTICAL INSTITUTE: Oscar 

Kempthorne, Chairman, T. W. Anderson, J. Cornfield, J. W. Tukey, Henry Scheffé. 

COMMITTEE TO PLAN 1958 SUMMER STATISTICAL INSTITUTE: Herman Chernoff, 
Chairman, Leo Hurwicz, L. J. Savage, Lionel Weiss. 

COMMITTEE TO ORGANIZE i958 SUMMER STATISTICAL INSTITUTE: IL. R. Sav- 
age, Chairman, J. L. Hodges, W. Hoeffding, William Kruskal, Frank Wilcoxon. 

COMMITTEE ON PROFESSIONAL STANDARDS OF STATISTICIANS IN GOVERN- 
MENT SERVICE: B. F. Kimball, Chairman, Robert W. Burgess, Churchill Eisenhart, 
G. M. Harrington, A. 8. Householder, Joseph Lev, Herbert Marshall, Robert E. Patton, 
John E. Walsh. 

COMMITTEE ON MATHEMATICAL TABLES: J. W. Tukey, Chairman, R. L. Anderson, 
A. H. Bowker, E. E. Cureton, W. J. Dixon, C. W. Dunnett, Churchill Eisenhart, J. A. 
Greenwood, H. O. Hartley, William Kruskal, D. B. Owen, Daniel Teichroew, M. A. Wood- 
bury. 
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COMMITTEE ON EXCHANGES: P. 8. Dwyer, A. H. Bowker, T. E. Harris, G. E. Nichol- 
son. 

AD HOC COMMITTEE ON HIGH SPEED MACHINES: R. L. Anderson, Chairman, F. 8. 
Acton, K. J. Arnold, J. Moshman, H. W. Norton, G. J. Resnikoff, C. F. Cossack, W. H. 
Kruskal, W. J. Merrill, H. A. Meyer, R. Slimak, Z. Szatrowski, D. Teichroew. 


REPRESENTATIVES TO OTHER ORGANIZATIONS 
AMERICAN ASSOCIATION FOR THE ADVANCEMENT OF SCIENCE: Harold Hotel- 
ling 
NATIONAL RESEARCH COUNCIL, DIVISION OF MATHEMATICS: §. 8. Wilks 
(until 1 July 1957), W. Allen Wallis (beginning 1 July 1957). 
POLICY COMMITTEE FOR MATHEMATICS: Joseph F. Daly 
ADVISORY COMMITTEE OF AMERICAN STANDARDS ASSOCIATION ON ISO/ 
TC69, STATISTICAL TREATMENT OF OBSERVATIONS: Howard Raiffa 
COUNCIL OF U.S. CENSUS USERS: W. E. Deming 


REPORT OF THE SECRETARY FOR 1957 


Duting 1957 to date The Institute has held its 73rd through 74th meetings. A 
business meeting was held during 74th (20th Annual) meeting. The Program 
Committees are to be congratulated on the excellent programs which have been 
arranged under the immediate direction of Marvin Zelen and W. H. Clatworthy 
with the overall guidance of our Program Coordinator, M. B. Wilk. The Assistant 
Secretaries, Eugene Lukas and Jerome Cornfield, are to be congratulated on the 
physical arrangements, and the Associate Secretary, Dorothy M. Gilford, on 
her performance of the duties of the Secretary with respect to the meetings. 

GeEorGE E. NicHoLson, Jr. 
Secretary 
_———— 


MINUTES OF THE ANNUAL BUSINESS MEETING SEPTEMBER 11, 1957 


The annual business meeting of The Institute of Mathematical Statistics was 
called to order at 5:30 p.m. September 11, 1957, in the Surf Room of the Ambas- 
sador Hotel in Atlantic City, New Jersey, by President Alexander M. Mood. 
Approximately 60 members were present. 

Minutes of the August 22, 1956, business meeting held in Seattle, Washington, 
were approved. 

Reports of the Secretary, Treasurer, Editor, and Program Coordinator were 
presented and approved. The work of the Associate Secretary, Dorothy M. 
Gilford, Assistant Secretary Jerome Cornfield, and Program Coordinator Martin 
Wilk in organizing the meeting was commended by the Secretary. President 
Mood called special attention to the work done by Program Committee Chair- 
man William Clatworthy and extended his personal thanks together with that 
of The Institute. This action was applauded by the members. 

The tellers distributed ballots to those members who had not voted by mail. 


President Mood presented his report and turned the chair over to the new 
President L. J. Savage. 
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Herman Rubin objected to the time of scheduling the business meeting, to 
the timing of the annual meeting in relation to the time of the American Math- 
ematical Society meeting, and to the fact that contributed papers sessions had 
been scheduled so that there were conflicts. David Stoller moved that these ob- 
jections be referred to the Program Committee. This motion passed. Doctor 
Rubin moved that the Program Committee be instructed to avoid conflicts of 
contributed papers sessions. This motion did not pass. 

Ingram Olkin was appointed Chairman of an ad hoc committee to investigate 
possibilities for making translations of papers published in Russian which are of 
interest to members of The Institute more readily available, and Eugene Lukacs 
was appointed a member of this committee. Irving Burr was appointed Chairman 
of an ad hoc committee to consider how to overcome the lack of adequate black- 
boards in hotels. Herbert Robbins and Martin Wilk were asked to serve on this 
committee. There was a discussion of the objection that too many invited papers 
had been scheduled for the annual meeting. 

A new policy for abstracts was announced. In the future abstracts should be 
in the hands of the Editor of THE ANNALS fifty days prior to the meeting 
instead of 47 days as had been the previous policy. Julius Blum suggested that 
notices of hotels to be used for annual meetings be sent out as soon as possible 
so members can get less expensive rooms. Irving Burr and Harold Hotelling 
spoke of the advantages of universities over hotels as meeting places. 

Next year is the last year of the three-year trial of having no annual Christmas 
meeting. Members should express opinions about this to Council members. 

The tellers announced the election of the following officers: J. Wolfowitz, 
President-Elect, David Blackwell, Council Member, 1958-1960, H. Hotelling, 
Council Member, 1958-1960, J. Neyman, Council Member, 1958-1960, I. R. 
Savage, Council Member, 1958-1960. 

The meeting was adjourned about 7:00 p.m. 


PUBLICATIONS RECEIVED 


Roy, Rene, Cahiers du Seminaire D’Econometrie: No. 4—Programme lineaire—Agregation 
et nombres indices. Editions du Centre National de la Recherche Scientifique 13, Quai 
Anatole-France Paris-7, 146 pp. 

Mann, Henry B., Introduction to the Theory of Stochastic Processes Depending on a Con 
tinuous Parameter, Ohio State University Press, Columbus 10, Ohio, $0.35. 

OweEN, D. B., Tables of the Normal Probability Integral, Technical Memorandum 64-57-51, 
Sandia Corporation, Albuquerque, New Mexico. 

Wap, ABRAHAM, Selected Papers in Statistics and Probability, Stanford University Press, 
Stanford, California, 1957, ix + 702 pp., $10.00. 

VARNER, WALTER W., Computing with Desk Calculators, Rinehart and Company, Inc., New 
York, 1957, viii + 108 pp., $2.00. 

Turrie, Atva M., Elementary Business and Economic Statistics, McGraw-Hill Book Com- 
pany, New York, 1957, xiii + 663 pp., $6.75. 
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Fractional Factorial Experiment Designs for Factors at Two Levels, National Bureau of 
Standards Applied Mathematics Series 48, 85 pages, issued April 15, 1957, $0.50. 

Standard Samples and Reference Standards, National Bureau of Standards, Circular 552 
(second edition), 24 pages, $0.25. (Order from the Superintendent of Documents, 
Government Printing Office, Washington 25, D. C.) 

Koopmans, TJALLING C., Three Essays on The State of Economic Science, McGraw Hill Book 
Company, New York, vii + 231 pp. $6.50. 

Piuual, K. C. SREEDHARAN, Concise Tables for Statisticians, Bookman, Inc. Manila (1957) 
pp. v + 50, $3.00. 

Fevier, WILLIAM, An Introduction to Probability Theory and its Applications, 2d E-d., John 
Wiley and Sons, Inc., New York, vii + 461 pp., $10.75. 


I 


INSTITUTIONAL MEMBERS 


AMERICAN ViscosE CoRPORATION, Marcus Hook, Pennsylvania. 

BELL TELEPHONE LABORATORIES, INC., TECHNICAL LIBRARY, 463 West Street, New York 14, 
New York. 

INTERNATIONAL BusiINESS MACHINES CorporRATION, New York 

Iowa State Couuecs, SratisticaL LaBoraTory, Ames, Iowa 

MassacHuSsETTts INSTITUTE OF TECHNOLOGY, HAYDEN LIBRARY, PERIODICAL DEPARTMENT, 
Cambridge 39, Massachusetts 

MicuicaNn State University, DEPARTMENT OF MATHEMATICS, East Lansing, Michigan 

NATIONAL Security Acency, Washington 25, D. C. 

PRINCETON UNIVERSITY, DEPARTMENT OF MATHEMATICS, SECTION OF MATHEMATICAL 
Sratistics, Princeton, New Jersey 

Purpvus University Liprariés, Lafayette, Indiana 

State University or Iowa, Iowa City, Iowa 

THe RamMo-Woo.pripGE Corporation, Los Angeles, California 

University or CALIFORNIA, STATISTICAL LABORATORY, Berkeley, California 

University oF ILurnors, Urbana, Illinois 

University or NortH Caro.ina, DEPARTMENT OF Statistics, Chapel Hill, North Carolina 

UNIVERSITY OF WASHINGTON, LABORATORY OF STATISTICAL RESEARCH, Seattle, Washington 








BIOMETRIKA 


Volume 44 Contents Parts 3 and 4, December 1957 


Karu Pearson, 1857-1957. Centenary lecture by J. B. 8. Hatpane. Lesuiz, P. H. An analysis of the data 
for some experiments carried out by Gause with populations of the protozoa, Paramecium aurelia and Para- 
mecium caudatum. Cox, D. R. & Situ, W. L. On the distribution of Tribolium confusum in a container. 
Watson, G. 8. The x* goodness-of-fit test for normal distributions. Satue, Y. 8. & Kamar, A. R. Approxi- 
mations to the distributions of some measures of dispersion based on successive differences. Haiaut, F. 
Queueing with balking. Duxsin, J. Testing for serial correlation in systems of simultaneous regression equa- 
tions. Curnnow, R. N. Heterogeneous error variances in split-plot experiments. Hanazis, A. J. A maximum- 
minimum problem related to statistical distributions in two dimensions. Roy, 8. N. & Gnanapesixan, R. 
Further contributions to multivariate confidence bounds. Stevens, W. L. Shorter intervals for the param- 
eter of the binomial and Poisson distributions. Jowett, G. H. Statistical analysis using local properties of 
smoothly heteromorphic stochastic series. Anscompe, F. J. Dependence of the fiducial argument on the 
sampling rule. Fietier, E.C., Hartiey, H.O. & Pearson, E. 8. Tests for rank correlation coefficients. I. 
Berkson, J. Tables for use in estimating the normal distribution function by normit analysis. Moore, 
P. G. The two-sample ¢-test based on range. Foster, F.G. Upper percentage points of the generalized beta- 
distribution. II. Dore, Avison. A bibliography on the theory of queues. 

Miscellanea—Contributions by M. 8. Bart ett, B. E. Cooper, N. L. Jounson, D. 8S. Paumer, A. 
R. Tuatcuer, J. W. Tukey 


Corrigenda—D. R. Cox Reviews Other Books Received 
The subscription, payable in advance, is now 54/- (or $8.00), per volume (including postage). Cheques should 
be made payable to Biometrika, crossed ‘‘a/c Biometrika Trust’’ and sent to the Secretary, Biometrika Office , 
Department of Statistics, University College, London, W.C.1. All foreign cheques must be drawn on a Bank 
having a London agency. 

Issued by THE BIOMETRIKA OFFICE, University College, London 


Colloquium Publications, Volume XX X1, Revised edition. 
FUNCTIONAL ANALYSIS 
AND SEMI-GROUPS 


by EINAR HILLE and RALPH S. PHILLIPS 


This is a completely revised and largely rewritten edition of the 
Colloquium volume of the same name by the first author. The frame- 
work of thezearlier book has been kept, but the subject matter has 
been rearranged and much expanded. The algebraic tools are intro- 
duced much earlier and put to important use thoughout the treatise. 
Functional analysis occupies almost one third of the book. The sec- 
tions devoted to the analytical theory of semi-groups have been 
augmented by new material on perturbation theory, adjoint theory, 
spectral theory, operational calculus, and stochastic theory. 

$13.80 805 pages 

25% discount to members of the Society 


Order from 


AMERICAN MATHEMATICAL SOCIETY 
190 Hope Street, Providence 6, R.I. 





ECONOMETRICA 


Journal of the Econometric Society 
Contents of Vel. 25, » No. 4 - October, 1957 


Kewnnrn J. ARROW ...............0.005- ..... Statistics and Economie Policy 


Kennetu J, ARROW.. sikdienankaveicdaed Utilities, Attitudes, Choices: A Review Note 
Henparik 8. HourHAKKER ; heen An International Comparison of Household Expenditure 

Patterns, Commemorating the Centenary of Engel’s Law 
Anan 6. Mamwe......... 2.0... ..A Linear Programming Model of the U. 8. Petroleum Refining Industry 
James Tosin.. eab aris ..Estimation of Relationships for Limited Dependent Variables 
Martin BeckKMANN ah cultiteheue ; Some Aspects of the Airline Reservations Problem 


Pe CAMs 6 ois ick dates ccieies : .....Tentative de Détermination Empirique da Fonctions de 


Production pour les Pays Industriels 


—— WAGNER... A Monte Carlo Study of Estimates of Simultaneous Linear Structural Equations 
. Sruvet The Impact o of ee in the Terms of Trade on Western Europe’s Balance of Payments 
eee FIsHERr.. ; ..seeeses...-A Sector Model: The Poultry Industry of the U. 8. A. 


Boox Reviews 


Automata Studies, (C. E. Channong and J. McCarthy, eds.). Review by Harry H. Goode 

Capital and Its Structure, ae M. Lachmann). Review by William P. Yohe 

Economic Progress, (Leon H riez, ed.). Review by Leif Johansen 

Foundations of Productivity oe ysis, ‘(Bela Gold). Review by C. F. Carter 

Hire Purchase Credit in South Africa, (T. Van Waasdijk). Review by Clark Warburton 

International Economic Papers No. 4: Translations Prepared for the International Economic Association, (Pea- 
cock, Turvey, Stolper, and Henderson, eds.). Review by Paul M. Sweezy 

Marketing Efficiency in Puerto Rico, (Galbraith, Holton, et al.). Review 4 Lester G. Telser 

On Economic Theory and Socialism, Collected Papers, (Maurice Dobb). Review by Kenneth O. May 

Methodologie economique, (Gilles-Gaston Granger). Review by Sten Thore 

Statistics: A New Approach, (W. A. Wallis and H. V. Roberts). Review by Maurice Quenouille 

Rapport sur les comptes de la nation. Review by Walter Froehlich 

Structural Interdependences of the Economy: Proceedings of an International Conference on Input-Output Analysis 
(T. Barna, ed.). Review by Robert Solow y 

Studien in the Economics of Transportation, (Beckmann, McGuire, Winsten, and Koopmans). Review by R. 


Zinstheorie, (F. A. Lutz). Review by Joseph Aschheim 


Die Ermittlung der wirtschaftlichen Nutzungsdauer von Anlageguetern, (Dr. Hansrudolf von Briel). Review by 
Eric Schiff 








JOURNAL OF THE 
ROYAL STATISTICAL SOCIETY 


Series B ( Methodological) 


30s per part Vol. 19 No. 2 1957 Ann. Sub. 


Incl. post 
£3 1s Od 
CONTENTS 


Problems in the Probability Theory of Storage Systems J. Gant 
Some Problems in the Theory of Dams D. 
Routine Analysis of Replicated Experiments on An Electric Computer 
Yates, M. J. R. Heaty and 8. Lirron. (With Discussion) 
Minimax Procedure for choosing between Two Populations using Sequential Sampling R. J. Maurice. 
Sufficient Statistics, Similar Regions and Distribution-free Tests G. 8. Watson. 
On the Use of the Normal Approximation in the Treatment of Stochastic Processes P 
The Variance of the Mean of a Stationary Process E 
The Comparison of Regression Variables... E 
Methods of Construction and Analysis of Serially Balanced Sequences M 
The Modified Latin Square B. Rosas and R. F. Wuire: 
Analysis of Variance as an Alternative to Factor Analysis M. A. Creasy. 
Some Further Results in the Non-Equilibrium Theory of a Simple Queue N. T. J. Battey. 
Normal Approximation to Machine Interference with Many Repair Men P. Naor. 
Stationary Distributions of the Negative Experimental Type for the Infinite Dam. 
J. Gant and N. U. Prasuv. 


The Royal Statistical Society, 21, Bentinck Street, London, W. 1 


G. KenpAL.vy. (With Discussion 


. WHITTLE. 
. J. HANNON, 
. J. WiLciiaMs. 
. R. SaMprorp 














SANKHYA 


The Indian Journal of Statistics 
Edited by P. C. Mahalanobis 





Vol. 18, Parts 1 & 2, 1957 


On the Performance Characteristic of Certain Methods of Determining Confidence Limits.. B. M. Bennett 
Senaitivity of a Proposed Method of Quality Control ....W. L, Stevens 
National Sample Survey Number Eight: Report on Preliminary Survey of Urban Employment September 
1953 
Sur La Convergence Stochastique Au Sens De Cesaro Et Sur Des Differences Importantes Entre La Con- 
vergence Presque Certaine Et Les Convergences En Probabilite Et — a D. Duevs 
Maximum Likelihood Estimation For the Multinomial Distribution................C, Rapwakrisuna Rao 
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